MarkTechPost@AI 2024年08月01日
weights2weights: A Subspace in Diffusion Weights that Behaves as an Interpretable Latent Space over Customized Diffusion Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

研究人员探索定制扩散模型的权重空间,创建了一个包含多种应用的可解释潜在空间。

📌 研究人员创建了包含超过60,000个模型的数据集,对定制扩散模型的权重空间进行探索,将其命名为'weights2weights'(w2w),并将其建模为一个子空间。

📌 通过分析w2w空间,展示了其在采样新身份、进行语义编辑(如添加胡须)以及从分布外输入生成真实身份等方面的效用,证明这是一个可解释的潜在空间。

📌 该方法通过使用Dreambooth微调潜在扩散模型并通过LoRA降低所得权重的维度来创建模型权重流形,以表示个体身份,并在该空间内进行各种操作。

Generative models, particularly GANs, have demonstrated the ability to encode meaningful visual concepts linearly within their latent space, allowing for controlled image edits, such as altering facial attributes like age or gender. However, multi-step generative models like diffusion models must still identify this linear latent space. Recent personalization methods, such as Dreambooth and Custom Diffusion, suggest a potential direction for finding such an interpretable latent space. These methods personalize diffusion models by fine-tuning specific subject images, leading to identity-specific model weights rather than relying on a latent code within the noise space.

Researchers from UC Berkeley, Snap Inc., and Stanford University explore the weight space of customized diffusion models by creating a dataset of over 60,000 models, each fine-tuned to represent different visual identities. They term this weight space “weights2weights” (w2w) and model it as a subspace. By analyzing this space, they demonstrate its utility for sampling new identities, making semantic edits (like adding a beard), and inverting images to generate realistic identities, even from out-of-distribution inputs. Their findings suggest that this w2w space is an interpretable latent space for identities, enabling various creative applications.

Image-based generative models like VAEs, Flow-based models, GANs, and Diffusion models have been widely used for creating high-quality, photorealistic images. GANs and Diffusion models are particularly noted for their controllability and customization abilities. Research has focused on fine-tuning these models to incorporate user-defined concepts, often by reducing the dimensionality of parameters through techniques like low-rank updates, operating within specific layers, or using hypernetworks. The latent space of GANs, especially the StyleGAN series, has been extensively studied for its editing capabilities, while recent efforts are exploring similar latent spaces within diffusion models. Additionally, studies have examined the structure of weight spaces in deep networks for model ensembling, editing, and other applications.

The method begins by creating a manifold of model weights to represent individual identities. This is done by fine-tuning latent diffusion models using Dreambooth and reducing the dimensionality of the resulting weights through LoRA. The fine-tuned weights form a dataset projected into a lower-dimensional space, termed w2w. Linear directions within this space are identified to correspond to semantic attributes, allowing for identity editing. Additionally, this manifold is used to constrain the inversion of a single image into its identity by optimizing weights within the w2w space, ensuring realistic identity reconstruction.

The experiments demonstrate the utility of the w2w space for manipulating human identities across several tasks. Using fine-tuning techniques, a synthetic dataset of ~65,000 identities was generated and encoded into model weights. These weights were used to sample new identities, edit identity attributes, and invert out-of-distribution identities into realistic ones—the w2w space allowed consistent and disentangled edits, preserving identity better than baseline methods. The study also found that increasing the number of models in the w2w space improves the disentanglement of attributes and the preservation of identities.

The study introduces the concept of w2w space, where diffusion model weights are treated as points in a space defined by other customized models. This space enables applications like sampling, editing, and inversion of model weights rather than images, focusing on human identities. While acknowledging the potential misuse for malicious identity manipulation, the authors hope the framework will be used to explore visual creativity and enhance model safety. They also suggest that w2w space could be generalized to other concepts beyond identities, which will be explored in future research. The space acts as an interpretable latent space for identity manipulation.


Check out the Paper, Model, and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here

The post weights2weights: A Subspace in Diffusion Weights that Behaves as an Interpretable Latent Space over Customized Diffusion Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

扩散模型 可解释潜在空间 权重空间
相关文章