This paper presents IndicRxNorm-LexMap-15K, a 15,000-row multilingual Indic medicine terminology instruction dataset grounded in RxNorm/RxNav metadata, covering Hindi, Bengali, Hinglish, and Banglish across six task families: medicine NER, RxNorm normalization, RxCUI entity linking, drug-field extraction, terminology summaries, and safety-boundary refusal. Using Adaption Labs' Adaptive Data platform, the dataset was refined from quality grade B to A. We fine-tuned Gemma 3 270M with LoRA/PEFT using Unsloth, improving JSON parse rate from 7.22% to 71.67% and RxCUI exact-match rate from 2.33% to 74.42% on a held-out sample. We also release a Kaggle Community Benchmark for structured medicine normalization evaluation. Dataset, LoRA adapter, and benchmark are publicly available. This work is for research and terminology normalization only — not diagnosis, prescription, or treatment advice.
Building similarity graph...
Analyzing shared references across papers
Loading...
Krishnendu Dasgupta
Building similarity graph...
Analyzing shared references across papers
Loading...
Krishnendu Dasgupta (Tue,) studied this question.
www.synapsesocial.com/papers/69fd7f86bfa21ec5bbf080ec — DOI: https://doi.org/10.5281/zenodo.20040893