What question did this study set out to answer?

The aim is to explore how large language models can enhance archaeological informatization by automating metadata extraction from excavation reports.

April 18, 2026Open Access

A Study on Archaeological Informatization Using Large Language Models (LLMs) – Proof of Concept for an Automated Metadata Extraction Pipeline from Archaeological Excavation Reports –

Key Points

The aim is to explore how large language models can enhance archaeological informatization by automating metadata extraction from excavation reports.
Reviewed past discussions on archaeological informatization.
Proposed a new approach using large language models (LLMs) for metadata extraction.
Developed a proof of concept (PoC) LLM-based application for document analysis.
Demonstrated the feasibility of using LLMs for automatic information structuring.
Showed improvements in research efficiency and data accessibility.
Highlighted the potential for LLMs to transform knowledge production in archaeology.

Abstract

This is a translated version of the following article: KIM Hongyeon, 2025, "대형 언어 모델(LLM)을 활용한 고고학 정보화 연구 – 발굴조사보고서의 메타데이터 자동 추출 파이프라인 개념 검증 –," Korean Journal of Heritage: History & Science, 58(3), pp. 34–61. DOI: 10.22755/kjchs.2025.58.3.34 The translation was condensed for readability and includes supplementary notes (footnotes) to aid international readers' understanding. These modifications do not affect the content, results, or interpretation of the original paper. This English edition was translated and edited by the author with the permission of the National Research Institute of Cultural Heritage, Korea. Copyright © 2025 National Research Institute of Cultural Heritage, Korea. --- The field of archaeology handles vast quantities of data, much of which exists in unstructured or semi-structured textual forms, which poses persistent challenges to its systematic use and dissemination. Previous efforts at informatization have often failed to fundamentally improve the research environment or data accessibility due to the unique characteristics of archaeological data and its ever-increasing volume. This paper first systematically reviews various past discussions on archaeological informatization against this backdrop and proposes a new strategy to approach this problem using large language models (LLMs), which have recently emerged as an innovative technology. Specifically, it explores the potential of using LLMs for automatic metadata extraction and information structuring from documents such as excavation reports, which often interweave repetitive narrative structures with standardized informational elements. Through this, it investigates the possibilities for improving research efficiency as well as the multifaceted use of archaeological knowledge in academic research, exhibitions, education, and cultural content development. In particular, the proof of concept (PoC) results of an LLM-based informatization application developed by the author serve as crucial evidence, going beyond a simple technology introduction, to propose a practical direction for redesigning the core structure of archaeological informatization in a more realistic and flexible manner. Furthermore, it suggests the potential for expanding knowledge services using LLMs post-informatization and highlights the potential for data-driven analysis to become a routine tool in archaeological research. Ultimately, this study presents new possibilities for advancing archaeology by applying LLMs as a core technology to revolutionize the overall methods of knowledge production, access, and interpretation. Keywords: Artificial Intelligence (AI), Large Language Models (LLM), Excavation Reports, Digital Archaeology, Metadata

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hongyeon Kim

Actions

Institutions

Hernia Center

Korea National Open University

Korea National University of Cultural Heritage

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Study on Archaeological Informatization Using Large Language Models (LLMs) – Proof of Concept for an Automated Metadata Extraction Pipeline from Archaeological Excavation Reports –

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study