Recent advancements in Text-to-3D generation are significantly limited by the capabilities of current 2D vision-language models. When these models attempt to distill complex multi-object descriptions, they often produce 3D outputs that suffer from issues like 3D geometric confusion and the Janus problem. To overcome these challenges, we introduce DreamAssemble, a novel framework that views 3D scenes as compositional assemblies of multiple objects. Specifically, our framework enables the simultaneous optimization of various 3D assets using multi-density neural fields for the first time, which helps maintain a consistent structure and greatly enhances the fidelity of the generated scenes. Furthermore, our method reduces the variance in the latent space during the distillation process by decomposing prompts, showing an improved ability to handle abstract textual descriptions and significantly alleviating the Janus problem commonly encountered in Text-to-3D generation. We provide comprehensive experimental results and visualizations that demonstrate the effectiveness of our proposed method, along with the corresponding theoretical analysis. This approach demonstrates significant potential for advancing the field of 3D generation. Our source code and more results are available at: https://anonymous.4open.science/r/DreamAssemble-F6A3/.
Building similarity graph...
Analyzing shared references across papers
Loading...
Bo Huang
Jinbao Wang
Dongmei Jiang
IEEE Transactions on Image Processing
University of Chinese Academy of Sciences
Peng Cheng Laboratory
Beijing Academy of Artificial Intelligence
Building similarity graph...
Analyzing shared references across papers
Loading...
Huang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69dc87ea3afacbeac03e9ffc — DOI: https://doi.org/10.1109/tip.2026.3676627