Amazon Bedrock Custom Model Import now generally available

Today, we’re pleased to announce the general availability (GA) of Amazon Bedrock Custom Model Import. This feature empowers customers to import and use their customized models alongside existing foundation models (FMs) through a single, unified API. Whether leveraging fine-tuned models like Meta Llama, Mistral Mixtral, and IBM Granite, or developing proprietary models based on popular open-source architectures, customers can now bring their custom models into Amazon Bedrock without the overhead of managing infrastructure or model lifecycle tasks.

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

With Amazon Bedrock Custom Model Import, customers can access their imported custom models on demand in a serverless manner, freeing them from the complexities of deploying and scaling models themselves. They’re able to accelerate generative AI application development by using native Amazon Bedrock tools and features such as Knowledge Bases, Guardrails, Agents, and more—all through a unified and consistent developer experience.

Benefits of Amazon Bedrock Custom Model Import include:

Flexibility to use existing fine-tuned models

Integration with Amazon Bedrock Features

Serverless

Support for popular model architectures

Safetensors

Amazon SageMaker

Amazon S3

Leverage Amazon Bedrock converse API

Amazon Bedrock Converse API

Getting started with Custom Model Import

One of the critical requirements from our customers is the ability to customize models with their proprietary data while retaining complete ownership and control over the tuned model artifact and its deployment. Customization could be in form of domain adaptation or instruction fine-tuning. Customers have a wide degree of options for fine-tuning models efficiently and cost effectively. However, hosting models presents its own unique set of challenges. Customers are looking for some key aspects, namely:

Using the existing customization investment and fine-grained control over customization. Having a unified developer experience when accessing custom models or base models through Amazon Bedrock’s API. Ease of deployment through a fully managed, serverless, service. Using pay-as-you-go inference to minimize the costs of their generative AI workloads. Be backed by enterprise grade security and privacy tooling.

Amazon Bedrock Custom Model Import feature seeks to address these concerns. To bring your custom model into the Amazon Bedrock ecosystem, you need to run an import job. The import job can be invoked using the AWS Management Console or through APIs. In this post, we demonstrate the code for running the import model process through APIs. After the model is imported, you can invoke the model by using the model’s Amazon Resource Name (ARN).

As of this writing, supported model architectures today include Meta Llama (v.2, 3, 3.1, and 3.2), Mistral 7B, Mixtral 8x7B, Flan and IBM Granite models like Granite 3B-Code, 8B-Code, 20B-Code and 34B-Code.

A few points to be aware of when importing your model:

Llama convert scripts

Mistral convert scripts

.safetensors

json

tokenizer_config.json

tokenizer.json

tokenizer.model

LoRA-PEFT

Model merging

Importing a model using the Amazon Bedrock console

Foundational models

Imported models

Models

Import Model

Model ARN object

Job name

AWS Key Management Service (AWS KMS)

Text playground

The import process validates that the model configuration complies with the specified architecture for that model by reading the config.json file and validates the model architecture values such as the maximum sequence length and other relevant details. It also checks that the model weights are in the Safetensors format. This validation verifies that the imported model meets the necessary requirements and is compatible with the system.

Fine tuning a Meta Llama Model on SageMaker

Meta Llama 3.2 offers multi-modal vision and lightweight models, representing Meta’s latest advances in large language models (LLMs). These new models provide enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level safety, the Llama 3.2 models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce features to help you build a new generation of AI experiences.

SageMaker JumpStart provides FMs through two primary interfaces: SageMaker Studio and the SageMaker Python SDK. This gives you multiple options to discover and use hundreds of models for your use case.

In this section, we’ll show you how to fine-tune the Llama 3.2 3B Instruct model using SageMaker JumpStart. We’ll also share the supported instance types and context for the Llama 3.2 models available in SageMaker JumpStart. Although not highlighted in this post, you can also find other Llama 3.2 Model variants that can be fine-tuned using SageMaker JumpStart.

Instruction fine-tuning

The text generation model can be instruction fine-tuned on any text data, provided that the data is in the expected format. The instruction fine-tuned model can be further deployed for inference. The training data must be formatted in a JSON Lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, but can be saved in multiple JSON Lines files. The training folder can also contain a template.json file describing the input and output formats.

Synthetic dataset

For this use case, we’ll use a synthetically generated dataset named amazon10Ksynth.jsonl in an instruction-tuning format. This dataset contains approximately 200 entries designed for training and fine-tuning LLMs in the finance domain.

The following is an example of the data format:

instruction_sample = {    "question": "What is Amazon's plan for expanding their physical store footprint and how will that impact their overall revenue?",    "context": "The 10-K report mentions that Amazon is continuing to expand their physical store network, including 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth.",    "answer": "Amazon is expanding their physical store footprint, with 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth."} print(instruction_sample)

Prompt template

Next, we create a prompt template for using the data in an instruction input format for the training job (because we are instruction fine-tuning the model in this example), and for inferencing the deployed endpoint.

import jsonprompt_template = {  "prompt": "question: {question} context: {context}",  "completion": "{answer}"}with open("prompt_template.json", "w") as f:    json.dump(prompt_template, f)

After the prompt template is created, upload the prepared dataset that will be used for fine-tuning to Amazon S3.

from sagemaker.s3 import S3Uploaderimport sagemakeroutput_bucket = sagemaker.Session().default_bucket()local_data_file = "amazon10Ksynth.jsonl"train_data_location = f"s3://{output_bucket}/amazon10Ksynth_dataset"S3Uploader.upload(local_data_file, train_data_location)S3Uploader.upload("prompt_template.json", train_data_location)print(f"Training data: {train_data_location}")

Fine-tuning the Meta Llama 3.2 3B model

Now, we’ll fine-tune the Llama 3.2 3B model on the financial dataset. The fine-tuning scripts are based on the scripts provided by the Llama fine-tuning repository.

from sagemaker.jumpstart.estimator import JumpStartEstimator estimator = JumpStartEstimator(    model_id=model_id,    model_version=model_version,    environment={"accept_eula": "true"},    disable_output_compression=True,    instance_type="ml.g5.12xlarge",  ) # Set the hyperparameters for instruction tuningestimator.set_hyperparameters(    instruction_tuned="True", epoch="5", max_input_length="1024") # Fit the model on the training dataestimator.fit({"training": train_data_location})

Importing a custom model from SageMaker to Amazon Bedrock

In this section, we will use a Python SDK to create a model import job, get the imported model ID and finally generate inferences. You can refer to the console screenshots in the earlier section for how to import a model using the Amazon Bedrock console.

Parameter and helper function set up

First, we’ll create a few helper functions and set up our parameters to create the import job. The import job is responsible for collecting and deploying the model from SageMaker to Amazon Bedrock. This is done by using the create_model_import_job function.

Stored safetensors need to be formatted so that the Amazon S3 location is the top-level folder. The configuration files and safetensors will be stored as shown in the following figure.

import jsonimport boto3from botocore.exceptions import ClientErrorbedrock = boto3.client('bedrock', region_name='us-east-1')job_name = 'fine-tuned-model-import-demo'sagemaker_model_name = 'meta-textgeneration-llama-3-2-3b-2024-10-12-23-29-57-373'model_url = {'s3DataSource':                 {'s3Uri':                      "s3://sagemaker-{REGION}-{AWS_ACCOUNT}/meta-textgeneration-llama-3-2-3b-2024-10-12-23-19-53-906/output/model/"                  }             }

Check the status and get job ARN from the response:

After a few minutes, the model will be imported, and the status of the job can be checked using get_model_import_job. The job ARN is then used to get the imported model ARN, which we will use to generate inferences.

def get_import_model_from_job(job_name):    response = bedrock.get_model_import_job(jobIdentifier=job_name)    return response['importedModelArn']job_arn = response['jobArn']import_model_arn = get_import_model_from_job(job_arn)

Generating inferences using the imported custom model

The model can be invoked by using the invoke_model and converse APIs. The following is a support function that will be used to invoke and extract the generated text from the overall output.

from botocore.exceptions import ClientErrorclient = boto3.client('bedrock-runtime', region_name='us-east-1')def generate_conversation_with_imported_model(native_request, model_id):    request = json.dumps(native_request)    try:        # Invoke the model with the request.        response = client.invoke_model(modelId=model_id, body=request)        model_response = json.loads(response["body"].read())        response_text = model_response["outputs"][0]["text"]        print(response_text)    except (ClientError, Exception) as e:        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")        exit(1)

Context set up and model response

Finally, we can use the custom model. First, we format our inquiry to match the fined-tuned prompt structure. This will make sure that the responses generated closely resemble the format used in the fine-tuning phase and are more aligned to our needs. To do this we use the template that we used to format the data used for fine-tuning. The context will be coming from your RAG solutions like Amazon Bedrock Knowledgebases. For this example, we take a sample context and add to demo the concept:

input_output_demarkation_key = "\n\n### Response:\n"question = "Tell me what was the improved inflow value of cash?"context = "Amazons free cash flow less principal repayments of finance leases and financing obligations improved to an inflow of $46.1 billion for the trailing twelve months, compared with an outflow of $10.1 billion for the trailing twelve months ended March 31, 2023."payload = {    "prompt": template[0]["prompt"].format(                                    question=question, # user query                                    context=context                                        + input_output_demarkation_key # rag context                                        ),    "max_tokens": 100,     "temperature": 0.01 }generate_conversation_with_imported_model(payload, import_model_arn)

The output will look similar to:

After the model has been fine-tuned and imported into Amazon Bedrock, you can experiment by sending different sets of input questions and context to the model to generate a response, as shown in the following example:

question: """How did Amazon's international segment operating income change             in Q4 2022 compared to the prior year?"""context: """Amazon's international segment reported an operating loss of             $1.1 billion in Q4 2022, an improvement from a $1.7 billion             operating loss in Q4 2021."""response:

Some points to note

This examples in this post are to demonstrate Custom Model Import and aren’t designed to be used in production. Because the model has been trained on only 200 samples of synthetically generated data, it’s only useful for testing purposes. You would ideally have more diverse datasets and additional samples with continuous experimentation conducted using hyperparameter tuning for your respective use case, thereby steering the model to create a more desirable output. For this post, ensure that the model temperature parameter is set to 0 and max_tokens run time parameter is set to a lower values such as 100–150 tokens so that a succinct response is generated. You can experiment with other parameters to generate a desirable outcome. See Amazon Bedrock Recipes and GitHub for more examples.

Best practices to consider:

This feature brings significant advantages for hosting your fine-tuned models efficiently. As we continue to develop this feature to meet our customers’ needs, there are a few points to be aware of:

Amazon CloudWatch

Import a customized model to Amazon Bedrock

Model support by AWS Region

sample notebook

Now available

Amazon Bedrock Custom Model Import is generally available today in Amazon Bedrock in the US-East-1 (N. Virginia) and US-West-2 (Oregon) AWS Regions. See the full Region list for future updates. To learn more, see the Custom Model Import product page and pricing page.

Give Custom Model Import a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

About the authors

Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area.

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. In this role, he functions as the Lead Architect, helping partners ideate, build, and launch Partner Solutions. As an Information Technology Leader, Jay specializes in artificial intelligence, generative AI, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.

Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.

Claudio Mazzoni is a Sr GenAI Specialist Solutions Architect at AWS working on world class applications guiding costumers through their implementation of GenAI to reach their goals and improve their business outcomes. Outside of work Claudio enjoys spending time with family, working in his garden and cooking Uruguayan food.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers leverage GenAI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a Ph.D. degree in Electrical Engineering. Outside of work, she loves traveling, working out and exploring new things.

Simon Zamarin is an AI/ML Solutions Architect whose main focus is helping customers extract value from their data assets. In his spare time, Simon enjoys spending time with family, reading sci-fi, and working on various DIY house projects.

Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.