What question did this study set out to answer?

To extract definitions of technical terms in drone forensics using large language models as a knowledge base.

March 26, 2026Open Access

Towards terms definition extraction in drone forensics using large language model

Key Points

To extract definitions of technical terms in drone forensics using large language models as a knowledge base.
Utilized five large language models: Davinci-002, ChatSonic, Claude, GPT3.5, and GPT4.
Modeled definition extraction as a named entity recognition problem.
Employed InstructOR for prompt representation and generated text embedding.
Selected the best definition using metrics based on distance scores and error analysis.
Conducted further experiments with newer models like GPT4o and Microsoft Copilot.
Identified Microsoft Copilot as producing better-aligned definitions compared to other models.
Validated the sum distance score as a reliable measure for definition selection through statistical analysis and manual curation.
Demonstrated the feasibility of using conversational LLMs for precise information extraction in specialized domains.

Abstract

In the era of large language models (LLMs), many natural language processing tasks have been impacted. The availability of textual data in the form of large corpora has been utilized to build an intelligent language model with large-size parameters. This has been an advantage for domain-specific problems where small-size data are at hand, such as drone forensics, making it hard to perform information extraction. In this paper, we propose to use conversational pre-trained LLM as the knowledge base for performing definition extraction of drone technical terms. We use five different LLMs, namely Davinci-002, ChatSonic, Claude, GPT3.5, and GPT4, as the knowledge base to generate the definition of the drone technical term. Extracting technical terms is modeled as a named entity recognition problem, where the entity span mentioned in a log message is the target term. To represent the prompt and the generated text, we use InstructOR as the embedding. Since five different chat LLMs generate the definitions, we model the problem of selecting one of the best definitions as an information retrieval problem. Three distance metrics are used, and the least sum distance score is selected as the best definition. Following an in-depth and comprehensive statistical analysis, error analysis, and manual curation, the sum distance is a legitimate base for choosing the best definition. It is confirmed by a manual investigation of the definition sentence generated by each LLM. A further experiment using a well-structured prompt is conducted on more recent models, such as GPT4o, Gemini 2.0, Claude 3.5 Sonnet, Deepseek V3, Qwen 2.5, Meta AI, and Microsoft Copilot. The experimental result shows that Microsoft Copilot, a search-augmented model, prompted via the web interface, produces better-aligned definitions to the reference glossary.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Silalahi et al. (Sun,) studied this question.

synapsesocial.com/papers/69c4ccbbfdc3bde448918455 https://doi.org/https://doi.org/10.1016/j.iswa.2026.200655

Bookmark

View Full Paper