What question did this study set out to answer?

This survey aims to explore the challenges and advancements in applying deep learning to tabular data.

April 10, 2026Open Access

A Survey on Tabular Data: From Tree-based Methods to Tabular Deep Learning

Key Points

This survey aims to explore the challenges and advancements in applying deep learning to tabular data.
Review of over a decade of research on tabular data.
Comparison between classical decision tree models and deep learning architectures.
Identification of trends in attention-based methods and pretraining strategies.
Recent deep learning methods show measurable progress in handling tabular data complexities.
Attention-based and hybrid architectures improve non-local interaction modeling.
Tree-based methods still perform well in small-data contexts.

Abstract

Tabular data remains one of the most challenging modalities for deep learning due to its heterogeneity, lack of spatial or sequential inductive bias, and small-sample regimes. Key complexities include mixed feature types, high-cardinality categoricals, missing or sparse entries, and weak or irregular feature interactions-conditions that make representation learning and generalization difficult for neural networks. While early neural approaches struggled to match tree-based methods, recent advances demonstrate measurable progress across representation learning, regularization, and architectural design. This survey synthesizes findings from more than a decade of research, highlighting tabular data across two major families of approaches: classical models centered on decision trees and ensembles, and deep learning architectures. We identify three major trends: (1) the shift toward attention-based and hybrid architectures capable of modeling non-local interactions; (2) growing use of pretraining, self-supervision, and foundation-model-style priors to mitigate overfitting; and (3) emerging diffusion, graph-based, and multimodal frameworks that improve robustness and generalization. Despite these advances, tree-based methods remain strong baselines in low-dimensional and small-data contexts. By distilling empirical patterns across benchmarks and applications, this survey clarifies when deep learning delivers real gains for tabular data. We close with a prioritized research agenda on scalability, interpretability, and unified modeling.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Shriyank Somvanshi

Sayantan Das

Syed Javed

Journals

ACM Computing Surveys

Actions

Institutions

Texas State University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Survey on Tabular Data: From Tree-based Methods to Tabular Deep Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study