Introduction This article proposes a new dataset for Named Entity Recognition based on PubMed articles and aiming to address the problem of Herb-Drug Interactions. It aims to offer a new dataset for recognizing herb-drug interaction entities, including contextual information. Background Machine learning and Deep learning provide users with powerful tools for task automation, but require large quantities of data to perform well. In the field of Natural Language Processing, training Deep Learning models requires the annotation of large corpora of text. While some corpora exist in medical literature, each specific task requires an adapted corpus. Methods The dataset was tested using a classical Named Entity Recognition pipeline, as well as new possibilities offered by generative AI. Results The dataset proposes annotated sentences of around a hundred articles and covers 15 entities, including herbs, drugs, and pathologies, as well as contextual information, such as cohort composition, patient information, or pharmacological clues. Discussion The study demonstrates that this dataset performs comparably to the DDI (Drug-Drug Interaction) corpus — a standard dataset in the drug Named Entity Recognition — for drug recognition, and performs well on most of the entities. Conclusion : We believe this corpus could help diversify pharmacological Named Entity Recognition.
Building similarity graph...
Analyzing shared references across papers
Loading...
Anthony Cnudde
Patrick Watrin
Charlotte Nachtegael
The Open Bioinformatics Journal
Building similarity graph...
Analyzing shared references across papers
Loading...
Cnudde et al. (Mon,) studied this question.
synapsesocial.com/papers/69a75afac6e9836116a2180e — DOI: https://doi.org/10.2174/0118750362377947250903082648