AWS Machine Learning Blog 05月30日 03:32
Text-to-image basics with Amazon Nova Canvas
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文是关于Amazon Nova Canvas的入门指南,这是一款基于亚马逊Bedrock的AI图像生成模型。文章介绍了如何在亚马逊Bedrock上设置和使用该模型,包括图像生成的流程、输入参数(如文本提示、负面提示、图像尺寸和CFG尺度)以及种子值。通过详细的步骤和示例,帮助用户理解和掌握AI图像生成的关键技术,并提供代码示例,方便用户实践。

🔑首先,文章详细介绍了在亚马逊Bedrock上设置和访问Amazon Nova Canvas的步骤,包括创建AWS账户、访问亚马逊Bedrock控制台、选择可用区域和启用模型访问等,为用户提供了清晰的入门指导。

🎨其次,文章深入探讨了图像生成的过程,即基于扩散的方法。它解释了模型如何从随机噪声开始,通过迭代去噪,结合文本提示和图像条件,最终生成符合要求的图像。同时,文章也强调了安全性和公平性在图像生成中的重要性。

📝接着,文章重点讲解了提示词的重要性,包括正面提示和负面提示的运用。它展示了如何通过不同的提示词,如主体描述、风格参考和构图元素,来控制图像的生成效果。此外,文章还阐述了图像尺寸、宽高比、CFG尺度和种子值等参数对生成结果的影响,并提供了相应的示例。

💻最后,文章提供了Amazon Bedrock InvokeModel API的请求和响应结构,并给出了Python代码示例,方便用户在本地环境中进行测试和实践。这有助于用户更好地理解Amazon Nova Canvas的工作原理,并将其应用于实际的图像生成任务中。

AI image generation has emerged as one of the most transformative technologies in recent years, revolutionizing how you create and interact with visual content. Amazon Nova Canvas is a generative model in the suite of Amazon Nova creative models that enables you to generate realistic and creative images from plain text descriptions.

This post serves as a beginner’s guide to using Amazon Nova Canvas. We begin with the steps to get set up on Amazon Bedrock. Amazon Bedrock is a fully managed service that hosts leading foundation models (FMs) for various use cases such as text, code, and image generation; summarization; question answering; and custom use cases that involve fine-tuning and Retrieval Augmented Generation (RAG). In this post, we focus on the Amazon Nova image generation models available in AWS Regions in the US, in particular, the Amazon Nova Canvas model. We then provide an overview of the image generation process (diffusion) and dive deep into the input parameters for text-to-image generation with Amazon Nova Canvas.

Get started with image generation on Amazon Bedrock

Complete the following steps to get setup with access to Amazon Nova Canvas and the image playground:

    Create an AWS account if you don’t have one already. Open the Amazon Bedrock console as an AWS Identity and Access Management (IAM) administrator or appropriate IAM user. Confirm and choose one of the Regions where the Amazon Nova Canvas model is available (for example, US East (N. Virginia)). In the navigation pane, choose Model access under Bedrock configurations.
    Under What is Model access, choose Modify model access or Enable specific models (if not yet activated).
    Select Nova Canvas, then choose Next.
    On the Review and submit page, choose Submit.
    Refresh the Base models
    If you see the Amazon Nova Canvas model in the Access Granted status, you are ready to proceed with the next steps.
    In the navigation pane, choose Image / Video under Playgrounds.
    Choose Select model, then choose Amazon and Nova Canvas. Then choose Apply.

You are all set up to start generating images with Amazon Nova Canvas on Amazon Bedrock. The following screenshot shows an example of our playground.

Understanding the generation process

Amazon Nova Canvas uses diffusion-based approaches to generate images:

Prompting fundamentals

Image generation begins with effective prompting—the art of crafting text descriptions that guide the model toward your desired output. Well-constructed prompts include specific details about subject, style, lighting, perspective, mood, and composition, and work better when structured as image captions rather than a command or conversation. For example, rather than saying “generate an image of a mountain,” a more effective prompt might be “a majestic snow-capped mountain peak at sunset with dramatic lighting and wispy clouds, photorealistic style.” Refer to Amazon Nova Canvas prompting best practices for more information about prompting.

Let’s address the following prompt elements and observe their impact on the final output image:

Positive and negative prompts

Positive prompts tell the model what to include. These are the elements, styles, and characteristics you want to observe in the final image. Avoid the use of negation words like “no,” “not,” or “without” in your prompt. Amazon Nova Canvas has been trained on image-caption pairs, and captions rarely describe what isn’t in an image. Therefore, the model has never learned the concept of negation. Instead, use negative prompts to specify elements to exclude from the output.

Negative prompts specify what to avoid. Common negative prompts include “blurry,” “distorted,” “low quality,” “poor anatomy,” “bad proportions,” “disfigured hands,” or “extra limbs,” which help models avoid typical generation artifacts.

In the following examples, we first use the prompt “An aerial view of an archipelago,” then we refine the prompt as “An aerial view of an archipelago. Negative Prompt: Beaches.”

The balance between positive and negative prompting creates a defined creative space for the model to work within, often resulting in more predictable and desirable outputs.

Image dimensions and aspect ratios

Amazon Nova Canvas is trained on 1:1, portrait and landscape resolutions, with generation tasks having a maximum output resolution of 4.19 million pixels (that is, 2048×2048, 2816×1536). For editing tasks, the image should be 4,096 pixels on its longest side, have an aspect ratio between 1:4 and 4:1, and have a total pixel count of 4.19 million or smaller. Understanding dimensional limitations helps avoid stretched or distorted results, particularly for specialized composition needs.

Classifier-free guidance scale

The classifier-free guidance (CFG) scale controls how strictly the model follows your prompt:

In the following examples, we use the prompt “Cherry blossoms, bonsai, Japanese style landscape, high resolution, 8k, lush greens in the background.”

The first image with CFG 2 captures some elements of cherry blossoms and bonsai. The second image with CFG 8 adheres more to the prompt with a potted bonsai, more pronounced cherry blossom flowers, and lush greens in the background.

Think of CFG scale as adjusting how literally your instructions are taken into consideration vs. how much artistic interpretation it applies.

Seed values and reproducibility

Every image generation begins with a randomization seed—essentially a starting number that determines initial conditions:

Reproducibility through seed values is essential for professional workflows, allowing refined iterations on the prompt or other input parameters to clearly see their effect, rather than completely random generations. The following images are generated using two slightly different prompts (“A portrait of a girl smiling” vs. “A portrait of a girl laughing”), while holding the seed value and all other parameters constant.

All preceding images in this post have been generated using the text-to-image (TEXT_IMAGE) task type of Amazon Nova Canvas, available through the Amazon Bedrock InvokeModel API. The following is the API request and response structure for image generation:

#Request Structure{    "taskType": "TEXT_IMAGE",    "textToImageParams": {        "text": string,         #Positive Prompt        "negativeText": string  #Negative Prompt    },    "imageGenerationConfig": {        "width": int,           #Image Resolution Width        "height": int,          #Image Resolution Width        "quality": "standard" | "premium",   #Image Quality        "cfgScale": float,      #Classifer Free Guidance Scale        "seed": int,            #Seed value        "numberOfImages": int   #Number of images to be generated (max 5)    }}#Response Structure{    "images": "images": string[], #list of Base64 encoded images    "error": string}

Code example

This solution can also be tested locally with a Python script or a Jupyter notebook. For this post, we use an Amazon SageMaker AI notebook using Python (v3.12). For more information, see Run example Amazon Bedrock API requests using an Amazon SageMaker AI notebook. For instructions to set up your SageMaker notebook instance, refer to Create an Amazon SageMaker notebook instance. Make sure the instance is set up in the same Region where Amazon Nova Canvas access is enabled. For this post, we create a Region variable to match the Region where Amazon Nova Canvas is enabled (us-east-1). You must modify this variable if you’ve enabled the model in a different Region. The following code demonstrates text-to-image generation by invoking the Amazon Nova Canvas v1.0 model using Amazon Bedrock. To understand the API request and response structure for different types of generations, parameters, and more code examples, refer to Generating images with Amazon Nova.

import base64  #For encoding/decoding base64 dataimport io  #For handling byte streamsimport json  #For JSON processingimport boto3  #AWS SDK for Pythonfrom PIL import Image  #Python Imaging Library for image processingfrom botocore.config import Config  #For AWS client configuration#Create a variable to fix the region to where Nova Canvas is enabledregion = "us-east-1"#Setup an Amazon Bedrock runtime clientclient = boto3.client(service_name='bedrock-runtime', region_name=region, config=Config(read_timeout=300))#Set the content type and accept headers for the API callaccept = "application/json"content_type = "application/json"#Define the prompt for image generationprompt = """A cat sitting on a chair, mountains in the background, low angle shot."""#Create the request body with generation parametersapi_request= json.dumps({        "taskType": "TEXT_IMAGE",  #Specify text-to-image generation        "textToImageParams": {            "text": prompt          },        "imageGenerationConfig": {            "numberOfImages": 1,   #Generate one image            "height": 720,        #Image height in pixels            "width": 1280,         #Image width in pixels            "cfgScale": 7.0,       #CFG Scale            "seed": 0              #Seed number for generation        }})#Call the Bedrock model to generate the imageresponse = client.invoke_model(body=api_request, modelId='amazon.nova-canvas-v1:0', accept=accept, contentType=content_type)        #Parse the JSON responseresponse_json = json.loads(response.get("body").read())#Extract the base64-encoded image from the responsebase64_image = response_json.get("images")[0]#Convert the base64 string to ASCII bytesbase64_bytes = base64_image.encode('ascii')#Decode the base64 bytes to get the actual image bytesimage_data = base64.b64decode(base64_bytes)#Convert bytes to an image objectoutput_image = Image.open(io.BytesIO(image_data))#Display the imageoutput_image.show()#Save the image to current working directoryoutput_image.save('output_image.png')

Clean up

When you have finished testing this solution, clean up your resources to prevent AWS charges from being incurred:

    Back up the Jupyter notebooks in the SageMaker notebook instance. Shut down and delete the SageMaker notebook instance.

Cost considerations

Consider the following costs from the solution deployed on AWS:

Conclusion

This post introduced you to AI image generation, and then provided an overview of accessing image models available on Amazon Bedrock. We then walked through the diffusion process and key parameters with examples using Amazon Nova Canvas. The code template and examples demonstrated in this post aim to get you familiar with the basics of Amazon Nova Canvas and get started with your AI image generation use cases on Amazon Bedrock.

For more details on text-to-image generation and other capabilities of Amazon Nova Canvas, see Generating images with Amazon Nova. Give it a try and let us know your feedback in the comments.


About the Author

Arjun Singh is a Sr. Data Scientist at Amazon, experienced in artificial intelligence, machine learning, and business intelligence. He is a visual person and deeply curious about generative AI technologies in content creation. He collaborates with customers to build ML and AI solutions to achieve their desired outcomes. He graduated with a Master’s in Information Systems from the University of Cincinnati. Outside of work, he enjoys playing tennis, working out, and learning new skills.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Nova Canvas AI图像生成 Bedrock 文本生成图像
相关文章