Machine translation has increasingly shifted toward Neural Machine Translation (NMT) because of its ability to handle input and output sequences of varying lengths. The incorporation of attention mechanisms in NMT systems enables the model to focus on the most relevant parts of the source sentence, rather than relying solely on a fixed representation of the entire input. While NMT improves translation quality by addressing long-range dependencies and contextual understanding, it also requires a large parallel corpus for training, which is a challenge for languages with less resources. The main focus of this research is to give solution for the unique challenges of translating Ayurvedic texts using NMT. Ayurvedic texts have collection of special and scientific words related to medicines and treatments. This makes the translation process more complex and needs very efficient approach for accurate translations. Also, the content of ayurvedic text books is in the form shlokas which is formed using very complex and compound words. In order to simplify the translation process efficiently this work uses a sandhi splitter module and an Anvaya Generator/ word reordering module. In order to develop NMT system for low resource language pair Sanskrit-Malayalam, there is a need of developing a parallel corpus especially for Ayurvedic text books. Also, as the NMT model is proposed for translation it requires a minimum amount of parallel data in the corpus. So, a number of general domain Sanskrit text books with verses, called shlokas, were also considered for developing parallel corpora. The authors developed a parallel corpus for Anvaya Generator, sandhi splitter and translation. Mainly four NMT models were developed trained and tested especially for shlokas as input. The two models are basic transformer model with attention and an encoder-decoder model using Long-Short term Memory (LSTM) with attention. The other two are developed by adding two modules called Sandhi Splitter and Anvaya Generator in the pre-processing stages of the earlier models- Transformer based model and LSTM based model. The limitations of low resources and richness in grammatical structure of Sanskrit- Malayalam language pair are overcome by the concepts of deep learning and the additional modules used in preprocessing stages for developing the models. The models were tested with and without sandhi splitter and Anvaya Generator modules. The transformer-based model integrated with sandhi splitter and Anvaya Generator system achieved a higher average BLEU score of 73.11 and a uni-gram BLEU score of 76.93 for Sanskrit verses to Malayalam translation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sreedeepa H S
Sumam Mary Idicula
ACM Transactions on Asian and Low-Resource Language Information Processing
Cochin University of Science and Technology
And Technology Research (United Kingdom)
Building similarity graph...
Analyzing shared references across papers
Loading...
S et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69fc2c1f8b49bacb8b347c19 — DOI: https://doi.org/10.1145/3813805
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: