Balinese language is a local language that is widely use and spoken by Balinese people including in social media. However, the nuances of these politeness levels are often lost in informal digital communication and there is a significant lack of computational model to automatically classify them, especially for low-resource language like Balinese. The primary objective of this study is to evaluate the performance of the Multinomial Naive Bayes method combined with Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction, Chi-square feature selection, and Synthetic Minority Oversampling Technique (SMOTE) in classifying Balinese language levels. The dataset for this study consists of 1,314 annotated social media posts and comments, primarily sourced from Instagram. The annotation was conducted by a Balinese language expert to categorize text into six levels that represent varying degrees of politeness and formality. These levels are alus singgih (polite, used for respecting others), alus sor (polite, used for self-humbling), alus mider (polite, used for both respecting others and self-humbling), alus madia (an intermediate level of politeness), basa andap (casual, commonly used in everyday life), and basa kasar (impolite, often used during arguments or toward animals). The experimental results showed that the model successfully achieved an accuracy of 96.53% on the training data and 61.45% on the test data. Additionally, hyperparameter tuning revealed that the Multinomial Naive Bayes model with 2,720 selected features and SMOTE oversampling achieved an accuracy of 91.78%, significantly outperforming the baseline model without feature selection and oversampling, which obtained only 64.93% accuracy.
Building similarity graph...
Analyzing shared references across papers
Loading...
Putu Widyantara Artanta Wibawa
Cokorda Rai Adi Pramartha
I Gusti Ngurah Anom Cahyadi Putra
SHILAP Revista de lepidopterología
Udayana University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wibawa et al. (Thu,) studied this question.