What question did this study set out to answer?

To develop LLaDA2.0, a large language model with 100 billion parameters, using innovative adaptation processes.

December 22, 2025Open Access

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Key Points

To develop LLaDA2.0, a large language model with 100 billion parameters, using innovative adaptation processes.
Conversion from auto-regressive models to diffusion models
Implementation of block-level training schemes using a 3-phase approach
Open-sourcing of both LLaDA2.0-mini and LLaDA2.0-flash variants
LLaDA2.0 models successfully achieved scaling to 100B parameters
Showed improved efficiency in deployment
Retained advantages of parallel decoding

Abstract

This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale deployment. Instead of costly training from scratch, LLaDA2.0 upholds knowledge inheritance, progressive adaption and efficiency-aware design principle, and seamless converts a pre-trained AR model into dLLM with a novel 3-phase block-level WSD based training scheme: progressive increasing block-size in block diffusion (warm-up), large-scale full-sequence diffusion (stable) and reverting back to compact-size block diffusion (decay). Along with post-training alignment with SFT and DPO, we obtain LLaDA2.0-mini (16B) and LLaDA2.0-flash (100B), two instruction-tuned Mixture-of-Experts (MoE) variants optimized for practical deployment. By preserving the advantages of parallel decoding, these models deliver superior performance and efficiency at the frontier scale. Both models were open-sourced.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Tiwei Bie

Meng Cao

Kun Chen

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider