What question did this study set out to answer?

This research aims to improve sarcasm detection in user-generated text by combining RoBERTa embeddings with linguistic features.

May 8, 2026Open Access

HYSARD: A Hybrid Feature-Fusion Model for Sarcasm Detection Using RoBERTa Embeddings and Linguistic Features

Puntos clave

This research aims to improve sarcasm detection in user-generated text by combining RoBERTa embeddings with linguistic features.
Developed HYSARD, a hybrid feature-fusion model integrating RoBERTa embeddings and various linguistic features.
Implemented Random Forest-based feature selection to enhance the feature space and reduce redundancy.
Used SMOTE for addressing class imbalance during training.
Achieved an F1-score of 0.80 on the iSarcasmEval dataset, indicating strong detection performance.
Demonstrated consistent results across various datasets with significant class-wise discrimination.
The ablation study confirmed that the combination of contextual and linguistic features enhances detection accuracy.

Resumen

Sarcasm detection remains a challenging task in natural language processing because sarcastic expressions often convey meanings that contradict their literal wording. Although transformer-based encoders such as RoBERTa capture contextual semantics effectively, sparse linguistic signals common in sarcastic user-generated text, such as exaggerated punctuation, elongated words, capitalization, and sentiment contrast, may not always remain explicitly accessible in the final sentence representation. To address this limitation, we propose HYSARD, a hybrid feature-fusion model that combines RoBERTa-based sentence embeddings with complementary linguistic features, including sentiment polarity, stylistic markers, syntactic patterns, and TF-IDF lexical cues. The resulting feature space is refined through Random Forest-based feature selection to reduce redundancy and improve robustness, while SMOTE mitigates class imbalance during training. We evaluate HYSARD on the SemEval-2022 iSarcasmEval dataset and the balanced Main and Political subsets of SARC 2.0. Results show strong and consistent performance across datasets, with an F1-score of 0.80 on iSarcasmEval, while held-out test-set error analysis further highlights strong class-wise discrimination. The ablation study further confirms that combining contextual embeddings with explicit linguistic cues improves sarcasm detection over reduced feature configurations. These findings show that hybrid feature fusion remains an effective and practical strategy for sarcasm detection in noisy social media text.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Jabri et al. (Wed,) studied this question.

www.synapsesocial.com/papers/69fd7f65bfa21ec5bbf07ec4 — DOI: https://doi.org/10.3390/bdcc10050144

Authors

Ismail Jabri

Zine Eddine Louriga

Aziza El Ouaazizi

Journals

Big Data and Cognitive Computing

Actions

Institutions

Sidi Mohamed Ben Abdellah University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

HYSARD: A Hybrid Feature-Fusion Model for Sarcasm Detection Using RoBERTa Embeddings and Linguistic Features

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion