Tabular data remains one of the most challenging modalities for deep learning due to its heterogeneity, lack of spatial or sequential inductive bias, and small-sample regimes. Key complexities include mixed feature types, high-cardinality categoricals, missing or sparse entries, and weak or irregular feature interactions-conditions that make representation learning and generalization difficult for neural networks. While early neural approaches struggled to match tree-based methods, recent advances demonstrate measurable progress across representation learning, regularization, and architectural design. This survey synthesizes findings from more than a decade of research, highlighting tabular data across two major families of approaches: classical models centered on decision trees and ensembles, and deep learning architectures. We identify three major trends: (1) the shift toward attention-based and hybrid architectures capable of modeling non-local interactions; (2) growing use of pretraining, self-supervision, and foundation-model-style priors to mitigate overfitting; and (3) emerging diffusion, graph-based, and multimodal frameworks that improve robustness and generalization. Despite these advances, tree-based methods remain strong baselines in low-dimensional and small-data contexts. By distilling empirical patterns across benchmarks and applications, this survey clarifies when deep learning delivers real gains for tabular data. We close with a prioritized research agenda on scalability, interpretability, and unified modeling.
Building similarity graph...
Analyzing shared references across papers
Loading...
Somvanshi et al. (Wed,) studied this question.
synapsesocial.com/papers/69d896046c1944d70ce07408 — DOI: https://doi.org/10.1145/3807777
Shriyank Somvanshi
Texas State University
Sayantan Das
Texas State University
Syed Javed
ACM Computing Surveys
Texas State University
Building similarity graph...
Analyzing shared references across papers
Loading...