LaviGen: Repurposing 3D Generative Model for Autoregressive Layout Generation
Haoran Feng1,2*
Yifan Niu1*
Zehuan Huang1✉
Yang-Tian Sun3
Chunchao Guo4
Yuxin Peng5
Lu Sheng1✉
1Beihang University
2Tsinghua University
3University of Hong Kong
4Tencent-Hunyuan
5Peking University
*Equal Contribution
Corresponding Author
CVPR 2026
teaser
TL;DR: We introduce LaviGen, a framework that repurposes 3D generative models for autoregressive 3D layout generation, achieving 19% higher physical plausibility and 65% faster computation than the state of the art.
We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric relations and physical constraints among objects, producing coherent and physically plausible 3D scenes. To further enhance this process, we propose an adapted 3D diffusion model to integrate scene, object, and instruction information, and employ a dual-guidance self-rollout distillation mechanism to improve efficiency and spatial accuracy. Extensive experiments on the LayoutVLM benchmark show LaviGen achieves superior 3D layout generation performance, with 19% higher physical plausibility than the state of the art and 65% faster computation. We will release our code at https://github.com/fenghora/LaviGen.
Pipeline Overview
Layout Generation | Interactive Demo

Click on furniture items on the right to add them to the scene one by one. Click again to remove.

Layout Generation | Interactive Viewing

Click on a scene card to open the interactive 3D viewer. You can rotate, pan, and zoom the generated 3D layout using mouse or touch controls.

Applications | Layout Editing
Layout editing.

LaviGen naturally supports layout editing via a minimal autoregressive reformulation by swapping the prediction targets to enable context-aware regeneration. This enables object-level editing in native 3D space and produces spatially coherent edits.

Methodology

Pipeline of the method

As shown in the figure above, LaviGen formulates 3D layout generation as an autoregressive process. Conditioned on LLM-encoded instructions, at step i it takes the current scene state Si and the target object Oi, and predicts the updated state Si+1. To recover the object pose, we localize the newly generated region by computing the spatial difference between Si+1 and Si, and then fit Oi to obtain its translation, rotation, and scale.

Concretely, at each step we encode the scene and the target object into latents, concatenate them with a noisy latent, and denoise the unified sequence with an adapted 3D layout diffusion model under instruction conditioning. We further introduce an identity-aware positional embedding that augments RoPE with latent-source identity, explicitly separating scene context from the newly inserted object while preserving spatial alignment. Finally, we apply dual-guidance self-rollout distillation to reduce exposure bias and accelerate inference, combining holistic scene-level supervision with step-wise object-aware guidance to obtain a robust few-step student for long-horizon generation.

Citation

If you find our work useful, please consider citing:

@misc{feng2026repurposing3dgenerativemodel, title={Repurposing 3D Generative Model for Autoregressive Layout Generation}, author={Haoran Feng and Yifan Niu and Zehuan Huang and Yang-Tian Sun and Chunchao Guo and Yuxin Peng and Lu Sheng}, year={2026}, eprint={2604.16299}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2604.16299}, }