What question did this study set out to answer?

This work aims to create and refine a multilingual dataset for medicine normalization in Indic languages.

May 8, 2026Open Access

From Uncharted Data to Edge Adaptation: Fine-Tuning Gemma 3 270M for Multilingual Indic RxNorm Medicine Normalization

Key Points

This work aims to create and refine a multilingual dataset for medicine normalization in Indic languages.
Developed the IndicRxNorm-LexMap-15K dataset based on RxNorm/RxNav metadata.
Fine-tuned Gemma 3 270M using LoRA/PEFT, achieving significant increases in JSON parse and RxCUI exact-match rates.
Shared the dataset, LoRA adapter, and Kaggle benchmark for community evaluation.
Improved JSON parse rate from 7.22% to 71.67%.
Increased RxCUI exact-match rate from 2.33% to 74.42% on a held-out sample.

Abstract

This paper presents IndicRxNorm-LexMap-15K, a 15,000-row multilingual Indic medicine terminology instruction dataset grounded in RxNorm/RxNav metadata, covering Hindi, Bengali, Hinglish, and Banglish across six task families: medicine NER, RxNorm normalization, RxCUI entity linking, drug-field extraction, terminology summaries, and safety-boundary refusal. Using Adaption Labs' Adaptive Data platform, the dataset was refined from quality grade B to A. We fine-tuned Gemma 3 270M with LoRA/PEFT using Unsloth, improving JSON parse rate from 7.22% to 71.67% and RxCUI exact-match rate from 2.33% to 74.42% on a held-out sample. We also release a Kaggle Community Benchmark for structured medicine normalization evaluation. Dataset, LoRA adapter, and benchmark are publicly available. This work is for research and terminology normalization only — not diagnosis, prescription, or treatment advice.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Krishnendu Dasgupta

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

From Uncharted Data to Edge Adaptation: Fine-Tuning Gemma 3 270M for Multilingual Indic RxNorm Medicine Normalization

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study