Generate unique images by fine-tuning Stable Diffusion XL with Amazon SageMaker

Stable Diffusion XL by Stability AI is a high-quality text-to-image deep learning model that allows you to generate professional-looking images in various styles. Managed versions of Stable Diffusion XL are already available to you on Amazon SageMaker JumpStart (see Use Stable Diffusion XL with Amazon SageMaker JumpStart in Amazon SageMaker Studio) and Amazon Bedrock (see Stable Diffusion XL in Amazon Bedrock), allowing you to produce creative content in minutes. The base version of Stable Diffusion XL 1.0 assists with the creative process using generic subjects in the image, which enables use cases such as game character design, creative concept generation, film storyboarding, and image upscaling. However, for use cases that require generating images with a unique subject, you can fine-tune Stable Diffusion XL with a custom dataset by using a custom training container with Amazon SageMaker. With this personalized image generation model, you can incorporate your custom subject into the powerful image generation process that is provided by the Stable Diffusion XL base model.

In this post, we provide step-by-step instructions to create a custom, fine-tuned Stable Diffusion XL model using SageMaker to generate unique images. This automated solution helps you get started quickly by providing all the code and configuration necessary to generate your unique images—all you need is images of your subject. This is useful for use cases across various domains such as media and entertainment, games, and retail. Examples include using your custom subject for marketing material for film, character creation for games, and brand-specific images for retail. To explore more AI use cases, visit the AI Use Case Explorer.

Solution overview

The solution is composed of three logical parts:

The first part creates a Docker container image with the necessary framework and configuration for the training container. The second part uses the training container to perform model training on your dataset, and outputs a fine-tuned custom Low-Rank Adaptation (LoRA) model. LoRA is an efficient fine-tuning method that doesn’t require adjusting the base model parameters. Instead, it adds a smaller number of parameters that are applied to the base model temporarily. The third part takes the fine-tuned custom model and allows you to generate creative and unique images.

The following diagram illustrates the solution architecture.

The workflow to create the training container consists of the following services:

bring your own container

train.py

Amazon Elastic Container Registry

AWS CodeBuild

Various methods exist to fine-tune your model. Compared to methods that require training a new full model, the LoRA fine-tuning method doesn’t modify the original model. Instead, think of it as a layer on top of the base model. Not having to train and produce a full model for each subject has its advantages. This lowers the compute requirements for training, reduces the storage size of the models, and decreases the training time required, making the process more cost-effective at scale. In this post, we demonstrate how to create a LoRA model, based on the Stable Diffusion XL 1.0 base model, using your own subject.

The training workflow uses the following services and features:

Amazon Simple Storage Service

Amazon SageMaker Model Training

train.py

invoked

/opt/ml/model

Amazon SageMaker Pipelines

Now you’re ready to prompt your fine-tuned model to generate unique images. SageMaker gives you the flexibility to bring your own container for inference. You can use SageMaker hosting services with your own custom inference container to configure an inference endpoint. However, to demonstrate the Automatic1111 Stable Diffusion UI, we show you how to run inference on an Amazon Elastic Compute Cloud (Amazon EC2) instance (or locally on your own machine).

This solution fully automates the creation of a fine-tuned LoRA model with Stable Diffusion XL 1.0 as the base model. In the following sections, we discuss how to satisfy the prerequisites, download the code, and use the Jupyter notebook in the GitHub repository to deploy the automated solution using an Amazon SageMaker Studio environment.

The code for this end-to-end solution is available in the GitHub repository.

Prerequisites

This solution has been tested in the AWS Region us-west-2, but applies to any Region where these services are available. Make sure you have the following prerequisites:

AWS account

SageMaker domain

SageMaker domain user profile

Download the necessary code in SageMaker Studio

In this section, we walk through the steps to download the necessary code in SageMaker Studio and set up your notebook.

Navigate to the terminal in SageMaker Studio JupyterLab

Complete the following steps to open the terminal:

Open Studio

JupyterLab

JupyterLab

Run

Create JupyterLab space

Create space

Run space

Running

Open JupyterLab

Terminal

Download the code to your SageMaker Studio environment

Run the following commands from the terminal. For this post, you check out just the required directories of the GitHub repo (so you don’t have to download the entire repository).

git clone --no-checkout https://github.com/aws/amazon-sagemaker-examples.gitcd amazon-sagemaker-examples/git sparse-checkout set use-cases/text-to-image-fine-tuninggit checkout

If successful, you should see the output Your branch is up to date with 'origin/main'.

Open the notebook in SageMaker Studio JupyterLab

Complete the following steps to open the notebook:

File Browser

amazon-sagemaker-examples/use-cases/text-to-image-fine-tuning

kohya-ss-fine-tuning.ipynb

Select

You now have a kernel that is ready to run commands. In the following steps, we use this notebook to create the necessary resources.

Train a custom Stable Diffusion XL model

In this section, we walk through the steps to train a custom Stable Diffusion XL model.

Set up AWS infrastructure with AWS CloudFormation

For your convenience, an AWS CloudFormation template has been provided to create the necessary AWS resources. Before you create the resources, configure AWS Identity and Access Management (IAM) permissions for your SageMaker IAM role. This role is used by the SageMaker environment, and grants permissions to run certain actions. As with all permissions, make sure you follow the best practice of only granting the permissions necessary to perform your tasks.

Roles

AmazonSageMaker-ExecutionRole-<id>

Permissions policies

AmazonSageMaker-ExecutionPolicy-<id>

Edit

permissions

Next

Save changes

You now have the proper permissions to run commands in your SageMaker environment.

kohya-ss-fine-tuning.ipynb

Step One – Create the necessary resources through AWS CloudFormation

Wait for the CloudFormation stack to finish creating before moving on. You can monitor the status of the stack creation on the AWS CloudFormation console. This step should take about 2 minutes.

Set up your custom images and fine-tuning configuration file

In this section, you first upload your fine-tuning configuration file to Amazon S3. The configuration file is specific to the Kohya program. Its purpose is to specify the configuration settings programmatically rather than manually using the Kohya GUI.

This file is provided with opinionated values. You can modify the configuration file with different values if desired. For information about what the parameters mean, refer to LoRA training parameters. You will need to experiment to achieve the desired result. Some parameters rely on underlying hardware and GPU (for example, mixed_precision=bf16 or xformers). Make sure your training instance has the proper hardware configuration to support the parameters you select.

You also need to upload a set of images to Amazon S3. If you don’t have your own dataset and decide to use images from public sources, make sure to adhere to copyright and license restrictions.

The structure of the S3 bucket is as follows:

bucket/0001-dataset/kohya-sdxl-config.toml

bucket/0001-dataset/<asset-folder-name>/ (images and caption files go here)

bucket/0002-dataset/kohya-sdxl-config.toml

bucket/0002-dataset/<asset-folder-name>/ (images and captions files go here)

...

The asset-folder-name uses a special naming convention, which is defined later in this post. Each xxxx-dataset prefix can contain separate datasets with different config file contents. Each pipeline takes a single dataset as input. The config file and asset folder will be downloaded by the SageMaker training job during the training step.

Complete the following steps:

kohya-ss-fine-tuning.ipynb

Step Two – Upload the fine-tuning configuration file

sagemaker-kohya-ss-fine-tuning-<account id>

0001-dataset

kohya-sdxl-config.toml

Next, you create an asset folder and upload your custom images and caption files to Amazon S3. The asset-folder-name must be named according to the required naming convention. This naming convention is what defines the number of repetitions and the trigger word for the prompt. The trigger word is what identifies your custom subject. For example, a folder name of 60_dwjz signifies 60 repetitions with the trigger prompt word dwjz. Consider using initials or abbreviations of your subject for the trigger word so it doesn’t collide with existing words. For example, if your subject is a tiger, you could use the trigger word tgr. More repetitions don’t always translate to better results. Experiment to achieve your desired result.

sagemaker-kohya-ss-fine-tuning-<account id>

0001-dataset

Create folder

60_dwjz

Create folder

Upload

Add files

Upload

When selecting images to use, favor quality over quantity. Some preprocessing of your image assets might be beneficial, such as cropping a person if you are fine-tuning a human subject. For this example, we used approximately 30 images for a human subject with great results. Most of them were high resolution, and cropped to include the human subject only—head and shoulders, half body, and full body images were included but not required.

Optionally, you can use caption files to assist your model in understanding your prompts better. Caption files have the .caption extension, and its contents describe the image (for example, dwjz wearing a vest and sunglasses, serious facial expression, headshot, 50mm). The image file names should match the corresponding (optional) caption file names. Caption files are highly encouraged. Upload your caption files to the same prefix as your images.

At the end of your upload, your S3 prefix structure should look similar to the following:

bucket/0001-dataset/kohya-sdxl-config.toml

bucket/0001-dataset/60_dwjz/

bucket/0001-dataset/60_dwjz/1.jpg

bucket/0001-dataset/60_dwjz/1.caption

bucket/0001-dataset/60_dwjz/2.jpg

bucket/0001-dataset/60_dwjz/2.caption

...

There are many variables to fine-tuning, and as of this writing there are no definitive recommendations for generating great results. To achieve good results, include enough steps in the training, good resolution assets, and enough images.

Set up the required code

The code required for this solution is provided and will be uploaded to the CodeCommit repository that was created by the CloudFormation template. This code is used to build the custom training container. Any updates to the code in this repository will invoke the container image to be built and pushed to Amazon ECR through an EventBridge rule.

The code consists of the following components:

buildspec.yml

Dockerfile

train.py

Complete the following steps to create the training container image:

kohya-ss-fine-tuning.ipynb

Step Three – Upload the necessary code to the AWS CodeCommit repository

This event will initiate the process that creates the training container image and uploads the image to Amazon ECR.

kohya-ss-fine-tuning-build-container

Latest build status should display as In progress. Wait for the build to finish and the status to change to Succeeded. The build takes about 15 minutes.

A new training container image is now available in Amazon ECR. Every time you make a change to the code in the CodeCommit repository, a new container image will be created.

Initiate the model training

Now that you have a training container image, you can use SageMaker Pipelines with a training step to train your model. SageMaker Pipelines enables you to build powerful multi-step pipelines. There are many step types provided for you to extend and orchestrate your workflows, allowing you to evaluate models, register models, consider conditional logic, run custom code, and more. The following steps are used in this pipeline:

Condition step

Training Step

Fail step

Complete the following steps to initiate model training:

Pipelines

kohya-ss-fine-tuning-pipeline

Create

0001-dataset

Create

TrainNewFineTunedModel

The Details tab displays metadata, logs, and the associated training job. The Overview tab displays the output model location in Amazon S3 when training is complete (note this Amazon S3 location for use in later steps). SageMaker processes the training output by uploading the model in the /opt/ml/model directory of the training container to Amazon S3, in the location specified by the training job.

Wait for the pipeline status to show as Succeeded before proceeding to the next step.

Run inference on a custom Stable Diffusion XL model

There are many options for model hosting. For this post, we demonstrate how to run inference with Automatic1111 Stable Diffusion web UI running on an EC2 instance. This tool enables you to use various image generation features through a user interface. It’s a straightforward way to learn the parameters available in a visual format and experiment with supplementary features. For this reason, we demonstrate using this tool as part of this post. However, you can also use SageMaker to host an inference endpoint, and you have the option to use your own custom inference container.

Install the Automatic1111 Stable Diffusion web UI on Amazon EC2

Complete the following steps to install the web UI:

Get started with Amazon EC2

Windows Server 2022 Base Amazon Machine Image

g5.8xlarge

Install NVIDIA drivers

Automatic Installation on Windows

GitHub repo

webui-user.bat

Hugging Face

sd_xl_base_1.0.safetensors

../stable-diffusion-webui/models/Stable-diffusion/

Reload UI

sd_xl_base_1.0.safetensors

Stable Diffusion checkpoint

Width

Height

Sampling method

Sampling steps

CFG Scale

Seed

The input prompt is extremely important to achieve great results. You can add extensions to assist with your creative workflow. This style selector extension is great at supplementing prompts.

Extensions

Install from URL

Install

You will notice a new section called SDXL Styles, which you can select from to add to your prompts.

Download the fine-tuned model that was created by the SageMaker pipeline training step.

The model is stored in Amazon S3 with the file name model.tar.gz.

Share with a presigned URL

model.tar.gz

custom_lora_model.safetensors

../stable-diffusion-webui/models/Lora

Refresh

Lora

custom_lora_model

<lora:custom_lora_model:1>

1

Generate

Example results

These results are from a fine-tuned model trained on 39 high-resolution images of the author, using the provided code and configuration files in this solution. Caption files were written for each of these images, using the trigger word aallzz.

	Prompt: concept art <lora:custom_lora_model:1.0> aallzz professional headshot, cinematic, bokeh, dramatic lighting, shallow depth of field, vignette, highly detailed, high budget, 8k, cinemascope, moody, epic, gorgeous, digital artwork, illustrative, painterly, matte painting Negative Prompt: photo, photorealistic, realism, anime, abstract, glitch Sampler: DPM2 Sampling Steps: 90 CFG Scale: 8.5 Width/Height: 1024×1024
	Prompt: cinematic film still <lora:custom_lora_model:1> aallzz eating a burger, cinematic, bokeh, dramatic lighting, shallow depth of field, vignette, highly detailed, high budget, cinemascope, moody, epic, gorgeous, film grain, grainy Negative Prompt: anime, cartoon, graphic, painting, graphite, abstract, glitch, mutated, disfigured Sampler: DPM2 Sampling Steps: 70 CFG Scale: 8 Width/Height: 1024×1024
	Prompt: concept art <lora:custom_lora_model:1> aallzz 3D profile picture avatar, vector icon, character, mountain background, sun backlight, digital artwork, illustrative, painterly, matte painting, highly detailed Negative Prompt: photo, photorealistic, realism, glitch, mutated, disfigured, glasses Sampler: DPM2 Sampling Steps: 100 CFG Scale: 9 Width/Height: 1024×1024
	Prompt: concept art <lora:custom_lora_model:1> aallzz 3D profile picture avatar, vector icon, vector illustration, vector art, realistic cartoon character, professional attire, digital artwork, illustrative, painterly, matte painting, highly detailed Negative Prompt: photo, photorealistic, realism, glitch, mutated, disfigured, glasses, hat Sampler: DPM2 Sampling Steps: 100 CFG Scale: 10 Width/Height: 1024×1024
	Prompt: cinematic photo <lora:custom_lora_model:1> aallzz portrait, sitting, magical elephant with large tusks, wearing safari clothing, majestic scenery in the background, river, natural lighting, 50mm, highly detailed, photograph, film, bokeh, professional, 4k, highly detailed Negative Prompt: drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, glitch, mutated, disfigured, glasses, hat Sampler: DPM2 Sampling Steps: 100 CFG Scale: 9.5 Width/Height: 1024×1024

Clean up

To avoid incurring charges, delete the resources you created as part of this solution:

Delete the objects

Delete your container image

delete the stack

kohya-ss-fine-tuning-stack

stop

delete

Stop or delete

Conclusion

Congratulations! You have successfully fine-tuned a custom LoRA model to be used with Stable Diffusion XL 1.0. We created a custom training Docker container, fine-tuned a custom LoRA model to be used with Stable Diffusion XL, and used the resulting model to generate creative and unique images. The end-to-end training solution was fully automated with a CloudFormation template to help you get started quickly. Now, try creating a custom model with your own subject. To explore more AI use cases, visit the AI Use Case Explorer.

About the Author

Alen Zograbyan is a Sr. Solutions Architect at Amazon Web Services. He currently serves media and entertainment customers, and has expertise in software engineering, DevOps, security, and AI/ML. He has a deep passion for learning, teaching, and photography.

Solution overview

Prerequisites

Download the necessary code in SageMaker Studio

Navigate to the terminal in SageMaker Studio JupyterLab

Download the code to your SageMaker Studio environment

Open the notebook in SageMaker Studio JupyterLab

Train a custom Stable Diffusion XL model

Set up AWS infrastructure with AWS CloudFormation

Set up your custom images and fine-tuning configuration file

Set up the required code

Initiate the model training

Run inference on a custom Stable Diffusion XL model

Install the Automatic1111 Stable Diffusion web UI on Amazon EC2

Example results

Clean up

Conclusion

About the Author

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签