ABSTRACT A cross‐modal information retrieval (CMIR) has emerged as a pivotal research area, enabling efficient retrieval across diverse data with multiple modalities. With the production of multimodal data, advanced deep learning frameworks have demonstrated significant promise in aligning and mapping heterogeneous data representations into a unified latent space. This review explores the revolution of advanced deep learning techniques in CMIR, highlighting key advancements, methodology, and challenges, especially focusing on intelligent frameworks that leverage architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), Transformers, and generative adversarial networks (GANs), for enhancing semantic alignment and retrieval accuracy. It also discusses challenges such as modality, imbalance, cross‐representation, and inter‐permeability with other modalities, providing insight into emerging trends such as multi‐model, generative AI, autoencoders, and large‐scale, pretrained models, by synthesizing recent advancements and identifying research gaps. This review paper aims to provide a foundation for future exploration in intelligent CMIR systems; the findings underscore the transformative latent of advanced deep learning frameworks in addressing the growing demand for accurate and scalable CMIR solutions. This article is categorized under: Fundamental Concepts of Data and Knowledge > Knowledge Representation Technologies > Data Preprocessing Technologies > Machine Learning
Building similarity graph...
Analyzing shared references across papers
Loading...
Aamir Khan
Nisha Chandran S.
D. R. Gangodkar
Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery
Graphic Era University
Building similarity graph...
Analyzing shared references across papers
Loading...
Khan et al. (Wed,) studied this question.
www.synapsesocial.com/papers/698586ad8f7c464f2300a600 — DOI: https://doi.org/10.1002/widm.70055