Bioarchaeology continues to generate growing volumes of data from finite and often destructively sampled resources, making data reusability critical according to FAIR principles (Findable, Accessible, Interoperable, Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility and Ethics). However, much valuable information remains trapped in grey literature, particularly PDF-based reports, limiting discoverability and machine processing. This paper explores Natural Language Processing (NLP) and Named Entity Recognition (NER) techniques to improve access to osteoarchaeological and palaeopathological data in grey literature. The research developed and evaluated the Osteoarchaeological and Palaeopathological Entity Search (OPES), a lightweight prototype system designed to extract relevant terms from PDF documents within the Archaeology Data Service archive. Unlike transformer-based Large Language Models, OPES employs interpretable, computationally efficient, and sustainable NLP methods. A structured user evaluation (n = 83) involving students (42), experts (26), and the general public (15) assessed five success criteria: usefulness, time-saving ability, accessibility, reliability, and likelihood of reuse. Results demonstrate that while limitations remain in reliability and expert engagement, NLP and NER show clear potential to increase FAIRness of osteoarcheological datasets. The study emphasises the continued need for robust evaluation methodologies in heritage AI applications as new technologies emerge.
Building similarity graph...
Analyzing shared references across papers
Loading...
Alphaeus Lien-Talks
Heritage
University of York
Historic England
Building similarity graph...
Analyzing shared references across papers
Loading...
Alphaeus Lien-Talks (Mon,) studied this question.
www.synapsesocial.com/papers/69706c87b6488063ad5c19d7 — DOI: https://doi.org/10.3390/heritage9010035