LaviGen: Repurposing 3D Generative Model for Autoregressive Layout Generation

^*Equal Contribution

^✉Corresponding Author

CVPR 2026

TL;DR: We introduce LaviGen, a framework that repurposes 3D generative models for autoregressive 3D layout generation, achieving 19% higher physical plausibility and 65% faster computation than the state of the art.

We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric relations and physical constraints among objects, producing coherent and physically plausible 3D scenes. To further enhance this process, we propose an adapted 3D diffusion model to integrate scene, object, and instruction information, and employ a dual-guidance self-rollout distillation mechanism to improve efficiency and spatial accuracy. Extensive experiments on the LayoutVLM benchmark show LaviGen achieves superior 3D layout generation performance, with 19% higher physical plausibility than the state of the art and 65% faster computation. We will release our code at https://github.com/fenghora/LaviGen.

Pipeline Overview

Layout Generation | Interactive Demo

Click on furniture items on the right to add them to the scene one by one. Click again to remove.

Layout Generation | Interactive Viewing

Click on a scene card to open the interactive 3D viewer. You can rotate, pan, and zoom the generated 3D layout using mouse or touch controls.

Applications | Layout Editing

LaviGen naturally supports layout editing via a minimal autoregressive reformulation by swapping the prediction targets to enable context-aware regeneration. This enables object-level editing in native 3D space and produces spatially coherent edits.

Methodology

Pipeline of the method

As shown in the figure above, LaviGen formulates 3D layout generation as an autoregressive process. Conditioned on LLM-encoded instructions, at step i it takes the current scene state S_i and the target object O_i, and predicts the updated state S_i+1. To recover the object pose, we localize the newly generated region by computing the spatial difference between S_i+1 and S_i, and then fit O_i to obtain its translation, rotation, and scale.

Concretely, at each step we encode the scene and the target object into latents, concatenate them with a noisy latent, and denoise the unified sequence with an adapted 3D layout diffusion model under instruction conditioning. We further introduce an identity-aware positional embedding that augments RoPE with latent-source identity, explicitly separating scene context from the newly inserted object while preserving spatial alignment. Finally, we apply dual-guidance self-rollout distillation to reduce exposure bias and accelerate inference, combining holistic scene-level supervision with step-wise object-aware guidance to obtain a robust few-step student for long-horizon generation.

Citation

If you find our work useful, please consider citing:

@misc{feng2026repurposing3dgenerativemodel, title={Repurposing 3D Generative Model for Autoregressive Layout Generation}, author={Haoran Feng and Yifan Niu and Zehuan Huang and Yang-Tian Sun and Chunchao Guo and Yuxin Peng and Lu Sheng}, year={2026}, eprint={2604.16299}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2604.16299}, }