What question did this study set out to answer?

To develop LLaDA2.0, a large language model with 100 billion parameters, using innovative adaptation processes.

December 22, 2025Open Access

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Key Points

To develop LLaDA2.0, a large language model with 100 billion parameters, using innovative adaptation processes.
Conversion from auto-regressive models to diffusion models
Implementation of block-level training schemes using a 3-phase approach
Open-sourcing of both LLaDA2.0-mini and LLaDA2.0-flash variants
LLaDA2.0 models successfully achieved scaling to 100B parameters
Showed improved efficiency in deployment
Retained advantages of parallel decoding

Abstract

This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale deployment. Instead of costly training from scratch, LLaDA2.0 upholds knowledge inheritance, progressive adaption and efficiency-aware design principle, and seamless converts a pre-trained AR model into dLLM with a novel 3-phase block-level WSD based training scheme: progressive increasing block-size in block diffusion (warm-up), large-scale full-sequence diffusion (stable) and reverting back to compact-size block diffusion (decay). Along with post-training alignment with SFT and DPO, we obtain LLaDA2.0-mini (16B) and LLaDA2.0-flash (100B), two instruction-tuned Mixture-of-Experts (MoE) variants optimized for practical deployment. By preserving the advantages of parallel decoding, these models deliver superior performance and efficiency at the frontier scale. Both models were open-sourced.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Bie et al. (Wed,) studied this question.

www.synapsesocial.com/papers/69488bc877063b71e748ceaa — DOI: https://doi.org/10.48550/arxiv.2512.15745

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Tiwei Bie

Meng Cao

Kun Chen

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion