Abstract: This paper explores the significant challenges and limitations in developing Large Language Models (LLMs) for the Sanskrit language. Key issues include: Data Scarcity and Quality: A lack of extensive, high-quality, and diverse Sanskrit datasets hinders effective LLM training. Linguistic Complexity: Sanskrit's intricate grammar, syntax, and morphology pose significant challenges for LLMs designed for simpler languages. Cultural and Contextual Nuances: Accurately capturing the cultural and historical context of Sanskrit is crucial for meaningful LLM outputs. The paper also highlights potential pathways for future research, including: Collaborative efforts between linguists, cultural scholars, and technologists. Development of specialized datasets and computational resources. Addressing ethical considerations and ensuring cultural preservation. Essentially, while challenges exist, the paper maintains a positive outlook, suggesting that with targeted research and development, effective LLMs for Sanskrit are achievable.
Dr. Nilesh Joshi (Sun,) studied this question.