What question did this study set out to answer?

Evaluate NLP and NER techniques for enhancing the discoverability and usability of bioarchaeological data.

January 21, 2026Open Access

Evaluating Natural Language Processing and Named Entity Recognition for Bioarchaeological Data Reuse

Key Points

Evaluate NLP and NER techniques for enhancing the discoverability and usability of bioarchaeological data.
Developed and evaluated Osteoarchaeological and Palaeopathological Entity Search (OPES) system
Extracted relevant terms from PDF documents in the Archaeology Data Service archive
Conducted structured user evaluation with students, experts, and the general public assessing five criteria.
NLP and NER showed potential to increase the FAIRness of osteoarchaeological datasets
Limitations were noted in reliability and expert engagement
User evaluation indicated varying success across intended criteria.

Abstract

Bioarchaeology continues to generate growing volumes of data from finite and often destructively sampled resources, making data reusability critical according to FAIR principles (Findable, Accessible, Interoperable, Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility and Ethics). However, much valuable information remains trapped in grey literature, particularly PDF-based reports, limiting discoverability and machine processing. This paper explores Natural Language Processing (NLP) and Named Entity Recognition (NER) techniques to improve access to osteoarchaeological and palaeopathological data in grey literature. The research developed and evaluated the Osteoarchaeological and Palaeopathological Entity Search (OPES), a lightweight prototype system designed to extract relevant terms from PDF documents within the Archaeology Data Service archive. Unlike transformer-based Large Language Models, OPES employs interpretable, computationally efficient, and sustainable NLP methods. A structured user evaluation (n = 83) involving students (42), experts (26), and the general public (15) assessed five success criteria: usefulness, time-saving ability, accessibility, reliability, and likelihood of reuse. Results demonstrate that while limitations remain in reliability and expert engagement, NLP and NER show clear potential to increase FAIRness of osteoarcheological datasets. The study emphasises the continued need for robust evaluation methodologies in heritage AI applications as new technologies emerge.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Alphaeus Lien-Talks

Journals

Heritage

Actions

Institutions

University of York

Historic England

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Evaluating Natural Language Processing and Named Entity Recognition for Bioarchaeological Data Reuse

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study