ABSTRACT It is estimated that the volume of data on the digital fronts will grow exponentially to reach a volume of 180 zettabytes by 2025, and more than 90% of this data will be of unstructured forms. The unimodal to multimodal text analytics (MTA) has been triggered by this phenomenon. The early introduction of the multimodal text were observed in scholarly literature and industrial use‐cases during the early 2010s. Since then, it has greatly expanded its horizons in other sectors such as healthcare, e‐commerce, education and public safety. This survey presents a task‐oriented, modality‐inclusive, and dataset‐aware synthesis of recent advancements in MTA, which offers an in‐depth review of 10 core text analytics tasks through a multimodal lens. We systematically analyze over 160 research studies and categorize more than 120 state‐of‐the‐art models, spanning fusion strategies, representation learning, transformer architectures, and pretrained vision‐language frameworks (e.g., CLIP, ViLBERT). In a variety of datasets including CMU‐MOSI, CMU‐MOSEI, IEMOCAP, and MAViT‐Bangla, multimodal models achieve up to 18%–25% F 1‐score improvements over text‐only baselines, captured in the standardized task‐wise comparison tables that are part of this survey. Moreover, this survey discusses seven under‐explored tasks, including personality detection, satire detection, and author profiling, and elaborates gaps in research in modality fusion, diversity of data sets, and social inclusivity in these tasks. It does not only fill gaps in the current literature by unifying knowledge in different fields, but also offers researchers working on MTA a future path. It is the first survey that puts all the key tasks within multimodal text analytics into a contiguous and consistent overview compared to other surveys that either refer to multimodal computing at an administrative level or concentrate on a specific task. This article is categorized under: Algorithmic Development > Text Mining Algorithmic Development > Web Mining Application Areas > Society and Culture
Building similarity graph...
Analyzing shared references across papers
Loading...
Tanusree Nath
Vedika Gupta
Manjari Gupta
Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery
University of Tartu
Banaras Hindu University
Punjabi University
Building similarity graph...
Analyzing shared references across papers
Loading...
Nath et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d893626c1944d70ce04642 — DOI: https://doi.org/10.1002/widm.70083