Tabular data remains one of the most challenging modalities for deep learning due to its heterogeneity, lack of spatial or sequential inductive bias, and small-sample regimes. Key complexities include mixed feature types, high-cardinality categoricals, missing or sparse entries, and weak or irregular feature interactions-conditions that make representation learning and generalization difficult for neural networks. While early neural approaches struggled to match tree-based methods, recent advances demonstrate measurable progress across representation learning, regularization, and architectural design. This survey synthesizes findings from more than a decade of research, highlighting tabular data across two major families of approaches: classical models centered on decision trees and ensembles, and deep learning architectures. We identify three major trends: (1) the shift toward attention-based and hybrid architectures capable of modeling non-local interactions; (2) growing use of pretraining, self-supervision, and foundation-model-style priors to mitigate overfitting; and (3) emerging diffusion, graph-based, and multimodal frameworks that improve robustness and generalization. Despite these advances, tree-based methods remain strong baselines in low-dimensional and small-data contexts. By distilling empirical patterns across benchmarks and applications, this survey clarifies when deep learning delivers real gains for tabular data. We close with a prioritized research agenda on scalability, interpretability, and unified modeling.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shriyank Somvanshi
Sayantan Das
Syed Javed
ACM Computing Surveys
Texas State University
Building similarity graph...
Analyzing shared references across papers
Loading...
Somvanshi et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d896046c1944d70ce07408 — DOI: https://doi.org/10.1145/3807777