AWS Machine Learning Blog 2024年07月09日
Generate unique images by fine-tuning Stable Diffusion XL with Amazon SageMaker
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何使用 Amazon SageMaker 微调 Stable Diffusion XL 模型,以生成具有独特主题的专业级图像。该解决方案提供逐步指南,包括使用自定义数据集进行微调,并使用自定义训练容器生成独特图像。

🤔 该解决方案由三个逻辑部分组成:第一部分创建包含训练容器所需框架和配置的 Docker 容器镜像;第二部分使用训练容器对数据集执行模型训练,并输出微调后的自定义低秩自适应(LoRA)模型;第三部分使用微调后的自定义模型生成创意且独特的图像。

🚀 训练工作流程使用以下服务和功能:Amazon S3 用于存储自定义数据集和配置文件;Amazon SageMaker 模型训练用于管理训练作业;Amazon SageMaker Pipelines 用于自动化 ML 流程。

💡 LoRA 微调方法不会修改原始模型,而是为基础模型添加少量参数,从而降低训练的计算要求,减少模型的存储大小,并缩短训练时间,使其在规模上更具成本效益。

🖼️ 该解决方案完全自动化了使用 Stable Diffusion XL 1.0 作为基础模型的微调 LoRA 模型的创建。在本文中,我们将讨论如何满足先决条件、下载代码并使用 GitHub 存储库中的 Jupyter 笔记本在 Amazon SageMaker Studio 环境中部署自动化解决方案。

💻 该解决方案已在 AWS 区域 us-west-2 中经过测试,但适用于提供这些服务的任何区域。确保您具备以下先决条件:AWS 帐户、SageMaker 域和 SageMaker 域用户配置文件。

Stable Diffusion XL by Stability AI is a high-quality text-to-image deep learning model that allows you to generate professional-looking images in various styles. Managed versions of Stable Diffusion XL are already available to you on Amazon SageMaker JumpStart (see Use Stable Diffusion XL with Amazon SageMaker JumpStart in Amazon SageMaker Studio) and Amazon Bedrock (see Stable Diffusion XL in Amazon Bedrock), allowing you to produce creative content in minutes. The base version of Stable Diffusion XL 1.0 assists with the creative process using generic subjects in the image, which enables use cases such as game character design, creative concept generation, film storyboarding, and image upscaling. However, for use cases that require generating images with a unique subject, you can fine-tune Stable Diffusion XL with a custom dataset by using a custom training container with Amazon SageMaker. With this personalized image generation model, you can incorporate your custom subject into the powerful image generation process that is provided by the Stable Diffusion XL base model.

In this post, we provide step-by-step instructions to create a custom, fine-tuned Stable Diffusion XL model using SageMaker to generate unique images. This automated solution helps you get started quickly by providing all the code and configuration necessary to generate your unique images—all you need is images of your subject. This is useful for use cases across various domains such as media and entertainment, games, and retail. Examples include using your custom subject for marketing material for film, character creation for games, and brand-specific images for retail. To explore more AI use cases, visit the AI Use Case Explorer.

Solution overview

The solution is composed of three logical parts:

The following diagram illustrates the solution architecture.

The workflow to create the training container consists of the following services:

Various methods exist to fine-tune your model. Compared to methods that require training a new full model, the LoRA fine-tuning method doesn’t modify the original model. Instead, think of it as a layer on top of the base model. Not having to train and produce a full model for each subject has its advantages. This lowers the compute requirements for training, reduces the storage size of the models, and decreases the training time required, making the process more cost-effective at scale. In this post, we demonstrate how to create a LoRA model, based on the Stable Diffusion XL 1.0 base model, using your own subject.

The training workflow uses the following services and features:

Now you’re ready to prompt your fine-tuned model to generate unique images. SageMaker gives you the flexibility to bring your own container for inference. You can use SageMaker hosting services with your own custom inference container to configure an inference endpoint. However, to demonstrate the Automatic1111 Stable Diffusion UI, we show you how to run inference on an Amazon Elastic Compute Cloud (Amazon EC2) instance (or locally on your own machine).

This solution fully automates the creation of a fine-tuned LoRA model with Stable Diffusion XL 1.0 as the base model. In the following sections, we discuss how to satisfy the prerequisites, download the code, and use the Jupyter notebook in the GitHub repository to deploy the automated solution using an Amazon SageMaker Studio environment.

The code for this end-to-end solution is available in the GitHub repository.

Prerequisites

This solution has been tested in the AWS Region us-west-2, but applies to any Region where these services are available. Make sure you have the following prerequisites:

Download the necessary code in SageMaker Studio

In this section, we walk through the steps to download the necessary code in SageMaker Studio and set up your notebook.

Navigate to the terminal in SageMaker Studio JupyterLab

Complete the following steps to open the terminal:

    Log in to your AWS account and open the SageMaker Studio console. Select your user profile and choose Open Studio to open SageMaker Studio. Choose JupyterLab to open the JupyterLab application. This environment is where you will run the commands. If you already have a space created, choose Run to open the space. If you don’t have a space, choose Create JupyterLab space. Enter a name for the space and choose Create space. Leave the default values and choose Run space. When the environment shows a status of Running, choose Open JupyterLab to open the new space. In the JupyterLab Launcher window, choose Terminal.

Download the code to your SageMaker Studio environment

Run the following commands from the terminal. For this post, you check out just the required directories of the GitHub repo (so you don’t have to download the entire repository).

git clone --no-checkout https://github.com/aws/amazon-sagemaker-examples.gitcd amazon-sagemaker-examples/git sparse-checkout set use-cases/text-to-image-fine-tuninggit checkout

If successful, you should see the output Your branch is up to date with 'origin/main'.

Open the notebook in SageMaker Studio JupyterLab

Complete the following steps to open the notebook:

    In JupyterLab, choose File Browser in the navigation pane. Navigate to the project directory named amazon-sagemaker-examples/use-cases/text-to-image-fine-tuning. Open the Jupyter notebook named kohya-ss-fine-tuning.ipynb. Choose your runtime kernel (it’s set to use Python 3 by default). Choose Select.

You now have a kernel that is ready to run commands. In the following steps, we use this notebook to create the necessary resources.

Train a custom Stable Diffusion XL model

In this section, we walk through the steps to train a custom Stable Diffusion XL model.

Set up AWS infrastructure with AWS CloudFormation

For your convenience, an AWS CloudFormation template has been provided to create the necessary AWS resources. Before you create the resources, configure AWS Identity and Access Management (IAM) permissions for your SageMaker IAM role. This role is used by the SageMaker environment, and grants permissions to run certain actions. As with all permissions, make sure you follow the best practice of only granting the permissions necessary to perform your tasks.

    On the IAM console, choose Roles in the navigation pane. Choose the role named AmazonSageMaker-ExecutionRole-<id>. This should be the role that is assigned to your domain. In the Permissions policies section, choose the policy named AmazonSageMaker-ExecutionPolicy-<id>. Choose Edit to edit the customer managed policy. Add the following permissions to the policy, then choose Next. Choose Save changes to confirm your added permissions.

You now have the proper permissions to run commands in your SageMaker environment.

    Navigate back to your notebook named kohya-ss-fine-tuning.ipynb in your JupyterLab environment. In the notebook step labeled Step One – Create the necessary resources through AWS CloudFormation, run the code cell to create the CloudFormation stack.

Wait for the CloudFormation stack to finish creating before moving on. You can monitor the status of the stack creation on the AWS CloudFormation console. This step should take about 2 minutes.

Set up your custom images and fine-tuning configuration file

In this section, you first upload your fine-tuning configuration file to Amazon S3. The configuration file is specific to the Kohya program. Its purpose is to specify the configuration settings programmatically rather than manually using the Kohya GUI.

This file is provided with opinionated values. You can modify the configuration file with different values if desired. For information about what the parameters mean, refer to LoRA training parameters. You will need to experiment to achieve the desired result. Some parameters rely on underlying hardware and GPU (for example, mixed_precision=bf16 or xformers). Make sure your training instance has the proper hardware configuration to support the parameters you select.

You also need to upload a set of images to Amazon S3. If you don’t have your own dataset and decide to use images from public sources, make sure to adhere to copyright and license restrictions.

The structure of the S3 bucket is as follows:

bucket/0001-dataset/kohya-sdxl-config.toml

bucket/0001-dataset/<asset-folder-name>/     (images and caption files go here)

bucket/0002-dataset/kohya-sdxl-config.toml

bucket/0002-dataset/<asset-folder-name>/     (images and captions files go here)

...

The asset-folder-name uses a special naming convention, which is defined later in this post. Each xxxx-dataset prefix can contain separate datasets with different config file contents. Each pipeline takes a single dataset as input. The config file and asset folder will be downloaded by the SageMaker training job during the training step.

Complete the following steps:

    Navigate back to your notebook named kohya-ss-fine-tuning.ipynb in your JupyterLab environment. In the Notebook step labeled Step Two – Upload the fine-tuning configuration file, run the code cell to upload the config file to Amazon S3. Verify that you have an S3 bucket named sagemaker-kohya-ss-fine-tuning-<account id>, with a 0001-dataset prefix containing the kohya-sdxl-config.tomlfile.

Next, you create an asset folder and upload your custom images and caption files to Amazon S3. The asset-folder-name must be named according to the required naming convention. This naming convention is what defines the number of repetitions and the trigger word for the prompt. The trigger word is what identifies your custom subject. For example, a folder name of 60_dwjz signifies 60 repetitions with the trigger prompt word dwjz. Consider using initials or abbreviations of your subject for the trigger word so it doesn’t collide with existing words. For example, if your subject is a tiger, you could use the trigger word tgr. More repetitions don’t always translate to better results. Experiment to achieve your desired result.

    On the S3 console, navigate to the bucket named sagemaker-kohya-ss-fine-tuning-<account id>. Choose the prefix named 0001-dataset. Choose Create folder. Enter a folder name for your assets using the naming convention (for example, 60_dwjz) and choose Create folder. Choose the prefix. This is where your images and caption files go. Choose Upload. Choose Add files, choose your image files, then choose Upload.

When selecting images to use, favor quality over quantity. Some preprocessing of your image assets might be beneficial, such as cropping a person if you are fine-tuning a human subject. For this example, we used approximately 30 images for a human subject with great results. Most of them were high resolution, and cropped to include the human subject only—head and shoulders, half body, and full body images were included but not required.

Optionally, you can use caption files to assist your model in understanding your prompts better. Caption files have the .caption extension, and its contents describe the image (for example, dwjz wearing a vest and sunglasses, serious facial expression, headshot, 50mm). The image file names should match the corresponding (optional) caption file names. Caption files are highly encouraged. Upload your caption files to the same prefix as your images.

At the end of your upload, your S3 prefix structure should look similar to the following:

bucket/0001-dataset/kohya-sdxl-config.toml

bucket/0001-dataset/60_dwjz/

bucket/0001-dataset/60_dwjz/1.jpg

bucket/0001-dataset/60_dwjz/1.caption

bucket/0001-dataset/60_dwjz/2.jpg

bucket/0001-dataset/60_dwjz/2.caption

...

There are many variables to fine-tuning, and as of this writing there are no definitive recommendations for generating great results. To achieve good results, include enough steps in the training, good resolution assets, and enough images.

Set up the required code

The code required for this solution is provided and will be uploaded to the CodeCommit repository that was created by the CloudFormation template. This code is used to build the custom training container. Any updates to the code in this repository will invoke the container image to be built and pushed to Amazon ECR through an EventBridge rule.

The code consists of the following components:

Complete the following steps to create the training container image:

    Navigate back to your notebook named kohya-ss-fine-tuning.ipynb in your JupyterLab environment. In the step labeled Step Three – Upload the necessary code to the AWS CodeCommit repository, run the code cell to upload the required code to the CodeCommit repository.

This event will initiate the process that creates the training container image and uploads the image to Amazon ECR.

    On the CodeBuild console, locate the project named kohya-ss-fine-tuning-build-container.

Latest build status should display as In progress. Wait for the build to finish and the status to change to Succeeded. The build takes about 15 minutes.

A new training container image is now available in Amazon ECR. Every time you make a change to the code in the CodeCommit repository, a new container image will be created.

Initiate the model training

Now that you have a training container image, you can use SageMaker Pipelines with a training step to train your model. SageMaker Pipelines enables you to build powerful multi-step pipelines. There are many step types provided for you to extend and orchestrate your workflows, allowing you to evaluate models, register models, consider conditional logic, run custom code, and more. The following steps are used in this pipeline:

Complete the following steps to initiate model training:

    On the SageMaker Studio console, in the navigation pane, choose Pipelines. Choose the pipeline named kohya-ss-fine-tuning-pipeline. Choose Create to create a pipeline run. Enter a name, description (optional), and any desired parameter values. You can keep the default settings of using the 0001-dataset for the input data and an ml.g5.8xlarge instance type for training. Choose Create to invoke the pipeline.

    Choose the current pipeline run to view its details. In the graph, choose the pipeline step named TrainNewFineTunedModel to access the pipeline run information.

The Details tab displays metadata, logs, and the associated training job. The Overview tab displays the output model location in Amazon S3 when training is complete (note this Amazon S3 location for use in later steps). SageMaker processes the training output by uploading the model in the /opt/ml/model directory of the training container to Amazon S3, in the location specified by the training job.

Wait for the pipeline status to show as Succeeded before proceeding to the next step.

Run inference on a custom Stable Diffusion XL model

There are many options for model hosting. For this post, we demonstrate how to run inference with Automatic1111 Stable Diffusion web UI running on an EC2 instance. This tool enables you to use various image generation features through a user interface. It’s a straightforward way to learn the parameters available in a visual format and experiment with supplementary features. For this reason, we demonstrate using this tool as part of this post. However, you can also use SageMaker to host an inference endpoint, and you have the option to use your own custom inference container.

Install the Automatic1111 Stable Diffusion web UI on Amazon EC2

Complete the following steps to install the web UI:

    Create an EC2 Windows instance and connect to it. For instructions, see Get started with Amazon EC2. Choose Windows Server 2022 Base Amazon Machine Image, a g5.8xlarge instance type, a key pair, and 100 GiB of storage. Alternatively, you can use your local machine. Install NVIDIA drivers to enable the GPU. This solution has been tested with the Data Center Driver for Windows version 551.78. Install the Automatic1111 Stable Diffusion web UI using the instructions in the Automatic Installation on Windows section in the GitHub repo. This solution has been tested with version 1.9.3. The last step of installation will ask you to run webui-user.bat, which will install and launch the Stable Diffusion UI in a web browser.

    Download the Stable Diffusion XL 1.0 Base model from Hugging Face. Move the downloaded file sd_xl_base_1.0.safetensors to the directory ../stable-diffusion-webui/models/Stable-diffusion/. Scroll to the bottom of the page and choose Reload UI. Choose sd_xl_base_1.0.safetensors on the Stable Diffusion checkpoint dropdown menu. Adjust the default Width and Height values to 1024 x 1024 for better results. Experiment with the remaining parameters to achieve your desired result. Specifically, try adjusting the settings for Sampling method, Sampling steps, CFG Scale, and Seed.

The input prompt is extremely important to achieve great results. You can add extensions to assist with your creative workflow. This style selector extension is great at supplementing prompts.

    To install this extension, navigate to the Extensions tab, choose Install from URL, enter the style selector extension URL, and choose Install. Reload the UI for changes to take effect.

You will notice a new section called SDXL Styles, which you can select from to add to your prompts.

    Download the fine-tuned model that was created by the SageMaker pipeline training step.

The model is stored in Amazon S3 with the file name model.tar.gz.

    You can use the Share with a presigned URL option to share as well.

    Unzip the contents of the model.tar.gz file (twice) and copy the custom_lora_model.safetensors LoRA model file to the directory ../stable-diffusion-webui/models/Lora. Choose the Refresh icon on the Lora tab to verify that your custom_lora_model is available.

    Choose custom_lora_model, and it will populate the prompt input box with the text <lora:custom_lora_model:1>. Append a prompt to the text (see examples in the next section). You can decrease or increase the multiplier of your LoRA model by changing the 1 value. This adjusts the influence of your LoRA model accordingly. Choose Generate to run inference against your fine-tuned LoRA model.

Example results

These results are from a fine-tuned model trained on 39 high-resolution images of the author, using the provided code and configuration files in this solution. Caption files were written for each of these images, using the trigger word aallzz.

Prompt: concept art <lora:custom_lora_model:1.0> aallzz professional headshot, cinematic, bokeh, dramatic lighting, shallow depth of field, vignette, highly detailed, high budget, 8k, cinemascope, moody, epic, gorgeous, digital artwork, illustrative, painterly, matte painting

Negative Prompt: photo, photorealistic, realism, anime, abstract, glitch

Sampler: DPM2

Sampling Steps: 90

CFG Scale: 8.5

Width/Height: 1024×1024

Prompt: cinematic film still <lora:custom_lora_model:1> aallzz eating a burger, cinematic, bokeh, dramatic lighting, shallow depth of field, vignette, highly detailed, high budget, cinemascope, moody, epic, gorgeous, film grain, grainy

Negative Prompt: anime, cartoon, graphic, painting, graphite, abstract, glitch, mutated, disfigured

Sampler: DPM2

Sampling Steps: 70

CFG Scale: 8

Width/Height: 1024×1024

Prompt: concept art <lora:custom_lora_model:1> aallzz 3D profile picture avatar, vector icon, character, mountain background, sun backlight, digital artwork, illustrative, painterly, matte painting, highly detailed

Negative Prompt: photo, photorealistic, realism, glitch, mutated, disfigured, glasses

Sampler: DPM2

Sampling Steps: 100

CFG Scale: 9

Width/Height: 1024×1024

Prompt: concept art <lora:custom_lora_model:1> aallzz 3D profile picture avatar, vector icon, vector illustration, vector art, realistic cartoon character, professional attire, digital artwork, illustrative, painterly, matte painting, highly detailed

Negative Prompt: photo, photorealistic, realism, glitch, mutated, disfigured, glasses, hat

Sampler: DPM2

Sampling Steps: 100

CFG Scale: 10

Width/Height: 1024×1024

Prompt: cinematic photo <lora:custom_lora_model:1> aallzz portrait, sitting, magical elephant with large tusks, wearing safari clothing, majestic scenery in the background, river, natural lighting, 50mm, highly detailed, photograph, film, bokeh, professional, 4k, highly detailed

Negative Prompt: drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, glitch, mutated, disfigured, glasses, hat

Sampler: DPM2

Sampling Steps: 100

CFG Scale: 9.5

Width/Height: 1024×1024

Clean up

To avoid incurring charges, delete the resources you created as part of this solution:

    Delete the objects in your S3 bucket. You must delete the objects before deleting the stack. Delete your container image in Amazon ECR. You must delete the image before deleting the stack. On the AWS CloudFormation console, delete the stack named kohya-ss-fine-tuning-stack. If you created an EC2 instance for running inference, stop or delete the instance. Stop or delete your SageMaker Studio instances, applications, and spaces.

Conclusion

Congratulations! You have successfully fine-tuned a custom LoRA model to be used with Stable Diffusion XL 1.0. We created a custom training Docker container, fine-tuned a custom LoRA model to be used with Stable Diffusion XL, and used the resulting model to generate creative and unique images. The end-to-end training solution was fully automated with a CloudFormation template to help you get started quickly. Now, try creating a custom model with your own subject. To explore more AI use cases, visit the AI Use Case Explorer.


About the Author

Alen Zograbyan is a Sr. Solutions Architect at Amazon Web Services. He currently serves media and entertainment customers, and has expertise in software engineering, DevOps, security, and AI/ML. He has a deep passion for learning, teaching, and photography.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Stable Diffusion XL Amazon SageMaker 图像生成 微调 LoRA
相关文章