Since the introduction of denoising diffusion probabilistic models (DDPM) in 2020, diffusion-based image generation has achieved remarkable quality but remains computationally demanding for resource-constrained environments. This survey systematically analyzes over 100 publications from 2020 to 2025, presenting a four-layer optimization stack that encompasses model architecture, controllable mechanisms, sampling algorithms, and model compression. We address the fundamental “quality–efficiency–control” trilemma through three research questions: (1) the architectural complexity gap between U-shaped network (UNet) and diffusion transformer (DiT) models, (2) the parameter overhead spectrum of control mechanisms from ControlNet (42%) to NanoControl (0.024%), and (3) the theoretical impact of quantization and bit-width reduction on information loss. Our analysis reveals that instant image generation is achievable through algorithmic innovations such as step distillation and architectural pruning, reducing the sampling steps from 50 to 4–8 (or even 1) and computational cost by over 90%. We utilize the floating point operations (FLOPs) efficiency ratio (FER) to highlight the discrepancy between theoretical FLOPs reduction and actual efficiency, pointing towards the need for system-level optimization. Key findings demonstrate that DiT architectures exhibit high computational density (FER > 1.6) and low-bit quantization such as 8-bit weight, and activation (W8A8) maintains an optimal balance between compression and quality (Fréchet inception distance degradation ΔFID < 1.0), and lightweight control mechanisms enable sophisticated image control with a negligible parameter overhead. This survey provides a comprehensive algorithmic optimization roadmap for practitioners targeting efficient on-device image generation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Se-Jun Ham
Chun-Su Park
Building similarity graph...
Analyzing shared references across papers
Loading...
Ham et al. (Sat,) studied this question.
www.synapsesocial.com/papers/6994058c4e9c9e835dfd6827 — DOI: https://doi.org/10.3390/electronics15040828