What question did this study set out to answer?

This work aims to enhance the spatial alignment of images generated by text-to-image diffusion models without retraining.

February 6, 2026Open Access

InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models

Key Points

This work aims to enhance the spatial alignment of images generated by text-to-image diffusion models without retraining.
Introduced the InfSplign method during inference instead of training.
Utilized a compound loss to adjust noise in each denoising step.
Leveraged cross-attention maps from the backbone decoder for object placement and presence.
Achieved state-of-the-art performance on VISOR and T2I-CompBench benchmarks.
Outperformed existing inference-time baselines and fine-tuning methods.

Abstract

Text-to-image (T2I) diffusion models generate high-quality images but often fail to capture the spatial relations specified in text prompts. This limitation can be traced to two factors: lack of fine-grained spatial supervision in training data and inability of text embeddings to encode spatial semantics. We introduce InfSplign, a training-free inference-time method that improves spatial alignment by adjusting the noise through a compound loss in every denoising step. Proposed loss leverages different levels of cross-attention maps extracted from the backbone decoder to enforce accurate object placement and a balanced object presence during sampling. The method is lightweight, plug-and-play, and compatible with any diffusion backbone. Our comprehensive evaluations on VISOR and T2I-CompBench show that InfSplign establishes a new state-of-the-art (to the best of our knowledge), achieving substantial performance gains over the strongest existing inference-time baselines and even outperforming the fine-tuning-based methods. Codebase is available at GitHub.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Rastegar et al. (Sat,) studied this question.

www.synapsesocial.com/papers/698586388f7c464f2300a2a2 — DOI: https://doi.org/10.13016/m2bdjk-f37j

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Sarah Rastegar

Violeta Chatalbasheva

Sieger Falkena

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion