Modern applications such as recommender systems depend on user attributes to give users personalized experiences. User profiling is a key task in natural language processing (NLP) that lets systems figure out demographic and preference information from content that users create. Twitter, which is a popular social media site, is a great place to get this kind of information because it lets people send and receive messages and share information in real time. This study presents a methodology for generating summarized user profiles from Arabic Twitter accounts by predicting three main attributes: gender, interest, and location. Profiling in Arabic presents distinctive challenges due to its intricate morphology and orthography, leading to a scarcity of prior research. The proposed method uses both classical and deep learning models to predict a person’s gender and interests. It also uses the locations of most of a person’s friends to guess their location. The study also analyzes the effects of preprocessing, feature extraction, and feature selection, and elaborates on the dataset collection and annotation methodologies. Experimental results show that the highest F1-score for gender prediction (71%) is achieved with an SVM classifier using combined unigram and bigram features with feature selection, while interest prediction achieves the best F1-score (72%) using a BiLSTM model with Word2Vec embeddings. Location prediction attains an accuracy of 83% using the majority location among a user’s friends. These results demonstrate the effectiveness of the proposed methodology in constructing comprehensive user profiles from Arabic Twitter data.
Afaneh et al. (Thu,) studied this question.