What question did this study set out to answer?

The research aims to illustrate how modern deep learning frameworks can be utilized for Bayesian Item Response Theory parameter estimation.

April 15, 2026Open Access

Fitting Bayesian Item Response Theory Models Using Deep Learning Computational Frameworks

Key Points

The research aims to illustrate how modern deep learning frameworks can be utilized for Bayesian Item Response Theory parameter estimation.
Introduced PyTorch and TensorFlow for Bayesian IRT parameter estimation.
Framed IRT models as graphical models for better understanding.
Compared Hamiltonian Monte Carlo and variational inference estimators in a unified environment.
Conducted simulation studies to analyze mean squared error and bias in parameter estimates.
Presented empirical case studies comparing deep learning implementations with established IRT software.
Simulation studies indicated low mean squared error and bias for both estimation approaches in low-dimensional settings.
Variational inference sometimes underestimated posterior uncertainty in higher-dimensional scenarios.
Demonstrated that variational inference is efficient and scalable when using GPU acceleration.

Abstract

PyTorch and TensorFlow are two widely adopted modern deep learning frameworks that provide comprehensive computational libraries for developing and fitting complex models. Motivated by the technical barriers in recent item response theory (IRT) work and the lack of practice-oriented tutorials, we demonstrate how modern deep learning platforms can be used for Bayesian IRT parameter estimation by providing a didactic yet in-depth introduction to PyTorch and TensorFlow in a psychometric context, framing IRT models as graphical models, and offering step-by-step guidance that bridges probabilistic machine learning and psychometrics. In this study, we illustrate how to leverage these platforms to estimate widely used psychometric models in educational testing, psychological measurement, and behavioral assessment, namely dichotomous and polytomous IRT models and their multidimensional extensions. We compare Hamiltonian Monte Carlo and variational inference estimators for these models in a unified computational environment. Simulation studies show that both approaches yield parameter estimates with low mean squared error and bias in low-dimensional settings, while also indicating that VI might underestimate aspects of posterior uncertainty in higher-dimensional scenarios. Nonetheless, for practitioners who prioritize computational efficiency and scalability, especially when Graphics Processing Unit (GPU) acceleration is available, VI remains a compelling option. Three empirical case studies further demonstrate how PyTorch- and TensorFlow-based implementations compare with established IRT software in applied settings. We conclude by discussing the broader potential of integrating contemporary deep learning tools and perspectives into psychometric research.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Nanyu Luo

Yuting Han

Jun He

Journals

Journal of Educational and Behavioral Statistics

Actions

Institutions

University of Toronto

University of Florida

Xi’an Jiaotong-Liverpool University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Fitting Bayesian Item Response Theory Models Using Deep Learning Computational Frameworks

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider