What question did this study set out to answer?

The research aims to address challenges in translating Ayurvedic texts from Sanskrit to Malayalam using neural machine translation techniques.

May 7, 2026

An Efficient Hybrid Deep Learning Approach for Translating Sanskrit Shlokas into Malayalam with Linguistic Preprocessing

Key Points

The research aims to address challenges in translating Ayurvedic texts from Sanskrit to Malayalam using neural machine translation techniques.
Developed a parallel corpus for Ayurvedic and general Sanskrit texts.
Implemented a sandhi splitter and an Anvaya Generator in the preprocessing stage.
Trained and tested four NMT models, including a transformer and an LSTM model.
The transformer model with sandhi splitter and Anvaya Generator achieved a BLEU score of 73.11.
The uni-gram BLEU score reached 76.93 for translating Sanskrit verses to Malayalam.

Abstract

Machine translation has increasingly shifted toward Neural Machine Translation (NMT) because of its ability to handle input and output sequences of varying lengths. The incorporation of attention mechanisms in NMT systems enables the model to focus on the most relevant parts of the source sentence, rather than relying solely on a fixed representation of the entire input. While NMT improves translation quality by addressing long-range dependencies and contextual understanding, it also requires a large parallel corpus for training, which is a challenge for languages with less resources. The main focus of this research is to give solution for the unique challenges of translating Ayurvedic texts using NMT. Ayurvedic texts have collection of special and scientific words related to medicines and treatments. This makes the translation process more complex and needs very efficient approach for accurate translations. Also, the content of ayurvedic text books is in the form shlokas which is formed using very complex and compound words. In order to simplify the translation process efficiently this work uses a sandhi splitter module and an Anvaya Generator/ word reordering module. In order to develop NMT system for low resource language pair Sanskrit-Malayalam, there is a need of developing a parallel corpus especially for Ayurvedic text books. Also, as the NMT model is proposed for translation it requires a minimum amount of parallel data in the corpus. So, a number of general domain Sanskrit text books with verses, called shlokas, were also considered for developing parallel corpora. The authors developed a parallel corpus for Anvaya Generator, sandhi splitter and translation. Mainly four NMT models were developed trained and tested especially for shlokas as input. The two models are basic transformer model with attention and an encoder-decoder model using Long-Short term Memory (LSTM) with attention. The other two are developed by adding two modules called Sandhi Splitter and Anvaya Generator in the pre-processing stages of the earlier models- Transformer based model and LSTM based model. The limitations of low resources and richness in grammatical structure of Sanskrit- Malayalam language pair are overcome by the concepts of deep learning and the additional modules used in preprocessing stages for developing the models. The models were tested with and without sandhi splitter and Anvaya Generator modules. The transformer-based model integrated with sandhi splitter and Anvaya Generator system achieved a higher average BLEU score of 73.11 and a uni-gram BLEU score of 76.93 for Sanskrit verses to Malayalam translation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sreedeepa H S

Sumam Mary Idicula

Journals

ACM Transactions on Asian and Low-Resource Language Information Processing

Actions

Institutions

Cochin University of Science and Technology

And Technology Research (United Kingdom)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An Efficient Hybrid Deep Learning Approach for Translating Sanskrit Shlokas into Malayalam with Linguistic Preprocessing

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider