arXiv:2502.15203v2 Announce Type: replace-cross Abstract: Integrating multiple personalized concepts into a single image has recently gained attention in text-to-image (T2I) generation. However, existing methods often suffer from performance degradation in complex scenes due to distortions in non-personalized regions and the need for additional fine-tuning, limiting their practicality. To address this issue, we propose FlipConcept, a novel approach that seamlessly integrates multiple personalized concepts into a single image without requiring additional tuning. We introduce guided appearance attention to enhance the visual fidelity of personalized concepts. Additionally, we introduce mask-guided noise mixing to protect non-personalized regions during concept integration. Lastly, we apply background dilution to minimize concept leakage, i.e., the undesired blending of personalized concepts with other objects in the image. In our experiments, we demonstrate that the proposed method, despite not requiring tuning, outperforms existing models in both single and multiple personalized concept inference. These results demonstrate the effectiveness and practicality of our approach for scalable, high-quality multi-concept personalization.