Personalize Anything for Free with Diffusion Transformer

Abstract

Personalized image generation aims to produce images of user-specified concepts while enabling flexible editing. Recent training-free approaches, while exhibit higher computational efficiency than training-based methods, struggle with identity preservation, applicability, and compatibility with diffusion transformers (DiTs). In this paper, we uncover the untapped potential of DiT, where simply replacing denoising tokens with those of a reference subject achieves zero-shot subject reconstruction. This simple yet effective feature injection technique unlocks diverse scenarios, from personalization to image editing. Building upon this observation, we propose Personalize Anything, a training-free framework that achieves personalized image generation in DiT through: (1) timestep-adaptive token replacement that enforces subject consistency via early-stage injection and enhances flexibility through late-stage regularization, and (2) patch perturbation strategies to boost structural diversity. Our method seamlessly supports layout-guided generation, multi-subject personalization, and mask-controlled editing. Evaluations demonstrate state-of-the-art performance in identity preservation and versatility. Our work establishes new insights into DiTs while delivering a practical paradigm for efficient personalization.

Method

Personalize Anything anchors subject identity in early denoising through mask-guided token replacement with preserved positional encoding, and transitions to multi-modal attention for semantic fusion with text in later steps. During token replacement, we inject variations via patch perturbations. This timestep-adaptive strategy balances identity preservation and generative flexibility.

Our method enables: (a) layout-guided generation by translating token-injected regions, (b) multi-subject composition through sequential token injection, and (c) inpainting and outpainting via specifying masks and increased replacement.

Qualitative Comparisons on Single-subject Personalization

Our method produces high-fidelity images that are highly consistent with the specified subjects, without necessitating training or fine-tuning.

Qualitative Comparisons on Multi-subject Personalization

Our method manages to maintain natural interactions among subjects via layout-guided generation, while ensuring each subject retains its identical characteristics and distinctiveness.

More Applications

Our method naturally extends to diverse real-world applications, including subject-driven image generation with layout guidance, inpainting and outpainting.

Ablation Study

We conduct ablation studies on single-subject personalization, examining the effects of token replacement timestep threshold \( \tau \) and the patch perturbation strategy.

BibTeX

@article{feng2025personalize,
  title={Personalize Anything for Free with Diffusion Transformer},
  author={Feng, Haoran and Huang, Zehuan and Li, Lin and Lv, Hairong and Sheng, Lu},
  journal={arXiv preprint arXiv:2503.12590},
  year={2025}
}