AWS Machine Learning Blog 2024年12月05日
A guide to Amazon Bedrock Model Distillation (preview)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Amazon Bedrock 推出了模型蒸馏功能,旨在通过将大型模型的知识迁移到更小、更快、更经济高效的模型中,来提高生成式 AI 应用的性能和效率。该功能利用合成数据生成技术,自动创建高质量的训练数据集,无需用户手动创建标注数据集。用户可以选择合适的教师模型和学生模型,并通过 Amazon Bedrock 提供的简化工作流程,完成模型蒸馏任务,最终获得在特定用例下与大型模型性能相当的、更经济高效的模型。这使得企业能够优化 AI 解决方案,并在检索增强生成、文档摘要、聊天机器人部署和文本分类等方面获得更高的投资回报。

🤔 **模型蒸馏概念:**Amazon Bedrock 模型蒸馏通过将大型、高性能模型(教师模型)的知识转移到更小、更快速、更经济高效的模型(学生模型)中,从而让学生模型在特定用例下达到与教师模型相当的性能。

🚀 **主要优势:**模型蒸馏能够提高模型效率,降低推理成本,并提供高级定制化功能,无需创建标注数据集,Amazon Bedrock 自动生成高质量的训练数据。同时,该功能易于使用,提供自动化工作流,简化了模型蒸馏过程。

💡 **应用场景:**模型蒸馏可以应用于检索增强生成(RAG)、文档摘要、聊天机器人部署和文本分类等场景,帮助企业构建优化后的 AI 解决方案,提高投资回报率。例如,构建能够处理大量并发查询的企业级搜索系统,或构建能够处理大量并发实时对话的客户服务聊天机器人。

⚙️ **工作流程:**Amazon Bedrock 提供两种模型蒸馏方式:一是利用历史调用日志,二是上传用例特定的提示或标注的提示-完成对。模型蒸馏过程包括设置权限、选择模型、提供输入数据集、启动模型蒸馏作业以及评估和部署学生模型等步骤。

📊 **模型选择:**选择合适的教师模型和学生模型至关重要。需考虑模型的性能、延迟和成本等因素,并根据用例需求选择合适的模型组合,以实现最佳的性能和成本效益。

When using generative AI, achieving high performance with low latency models that are cost-efficient is often a challenge, because these goals can clash with each other. With the newly launched Amazon Bedrock Model Distillation feature, you can use smaller, faster, and cost-efficient models that deliver use-case specific accuracy that is comparable to the largest and most capable models in Amazon Bedrock for those specific use cases.

Model distillation is the process of transferring knowledge from a more capable advanced model (teacher) to a smaller model (student), which is faster and more cost efficient to make the student model as performant as the teacher for a specific use-case. To transfer knowledge, your use-case specific prompts are used to first generate responses from the teacher model, and then the teacher responses are used to fine-tune the student model.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities to build generative AI applications, simplifying development with security, privacy, and responsible AI. With Amazon Bedrock Model Distillation, you can now customize models for your use case using synthetic data generated by highly capable models. At preview, Amazon Bedrock Model Distillation offers support for three model providers: Amazon, Anthropic, and Meta. The teacher and student models should be from the same model provider.

This post introduces the workflow of Amazon Bedrock Model Distillation. We first introduce the general concept of model distillation in Amazon Bedrock, and then focus on the important steps in model distillation, including setting up permissions, selecting the models, providing input dataset, commencing the model distillation jobs, and conducting evaluation and deployment of the student models after model distillation.

Key benefits of Amazon Bedrock Model Distillation

Use cases for Amazon Bedrock Model Distillation

By distilling knowledge from larger models into smaller, more agile ones, organizations are empowered to develop optimized AI solutions to achieve a higher return on their investments. Here are some applications where a distilled model can make a significant impact:

Amazon Bedrock Model Distillation workflow

Amazon Bedrock offers two options for using Amazon Bedrock Model Distillation. In the first option, you can create a distilled model by providing your production data using historical invocation logs from your previous interactions within Amazon Bedrock. In a production environment, you continue to use the existing Amazon Bedrock Inference APIs, such as the InvokeModel or Converse API, and turn on invocation logs that store model input data (prompts) and model output data (responses). You can optionally add request metadata to these inference requests to filter your invocation logs for specific use cases. By default, Amazon Bedrock reads only the prompts from the invocation logs and will generate responses from the teacher model selected in your distillation job. In this scenario, Amazon Bedrock might apply proprietary data synthesis techniques to generate diverse and high-quality responses from the teacher model to augment the fine-tuning dataset, potentially improving the performance of the distilled student model. The student model is then fine-tuned using the prompt and teacher response pairs. Optionally, you can configure Amazon Bedrock to extract both the prompt and response from the invocation logs. In this scenario, the teacher model selected in the distillation job must match the teacher model in the invocation log. No data synthesis techniques are applied. The prompt-response pairs are taken as is from the invocation logs and the student model is fine-tuned.

In the second option, you can upload your use-case specific prompts by directly uploading a JSONL file to Amazon Simple Storage Service (Amazon S3) containing your use-case specific prompts or labelled prompt-completion pairs. Amazon Bedrock generates responses from the teacher model for the provided prompts. If you provide a human-generated labeled dataset representing the ground truth, Amazon Bedrock can use these prompt-response pairs as golden examples to generate better teacher responses. The student model is then fine-tuned using the prompt-response pairs generated by the teacher model.

Prerequisites

To use the model distillation feature, make sure that you have satisfied the following requirements:

Both of these fields need to have enough quota to support your Provisioned Throughput model unit. Request a quota increase if necessary to accommodate your expected inference workload.

Model selection

Currently, Amazon Bedrock Model Distillation supports student-teacher combinations within the same model providers (for example, Amazon, Anthropic, or Meta).

Selecting the right models for distillation is crucial. The process involves choosing a teacher model for synthetic data generation and a student model to learn from the teacher’s output. The teacher model is typically larger and more capable, while the student model is smaller, faster, and more cost-efficient.

When selecting models, consider three key dimensions: performance, latency and cost. These factors are interconnected and adjusting one can affect the others.

Distillation input dataset

There are two main ways to prepare use-case specific input data for distillation in Amazon Bedrock:

Uploading a JSONL file to S3

If you have a dataset in the JSON Lines (JSONL) format, you can upload it to an S3 bucket. Each record in this JSONL file use the following structure:

{    "schemaVersion": "bedrock-conversation-2024",    "system": [        {            "text": string        }    ],    "messages": [        {            "role": "user",            "content": [                {                    "text": string                }            ]        },        {            "role": "assistant"            "content": [               {                   "text": string               }            ]        }    ]}

Specifically, each record has a mandatory field, schemaVersion, that must have the value bedrock-conversation-2024 at this launch. The record can optionally include a system prompt that indicates the role assigned to the model. In the messages field, the user role is required, containing the input prompt provided to the model, while the assistant role, containing the desired response, is optional.

At preview, Anthropic and Meta models only accept single-turn conversation prompts, meaning you can only have one user prompt. The Amazon (Nova) models support multi-turn conversations, allowing you to provide multiple user and assistant exchanges within one record.

Using historical invocation logs

Alternatively, you can use your historical invocation logs stored in Amazon S3 for model distillation. These logs capture the prompts, responses, and metadata from your previous model interactions, making them a valuable source of data. To use this method:

    Enable invocation logging: Make sure that you’ve enabled invocation logging for your account. If you haven’t done this yet, see to the prerequisites section for instructions. Add metadata to model invocations: When invoking models using the InvokeModel or Converse API, include a requestMetadata field with key value paris. This allows you to categorize and filter your interactions later. An example for using the Converse API would be:
{    "additionalModelRequestFields": JSON value,    "additionalModelResponseFieldPaths": ["string"],    "guardrailConfig": {        "guardrailIdentifier": "string",        "guardrailVersion": "string",        "trace": "string"    },    "inferenceConfig": {        "maxTokens": number,        "stopSequences": ["string"],        "temperature": number,        "topP": number    },    "messages": [{        "content": [{            ...        }],        "role": "string"    }],    "system": [{        ...    }],    "toolConfig": {        "toolChoice": {            ...        },        "tools": [{            ...        }]    },    "requestMetadata": {        "string": "string", // {"key": "value"}        "string": "string", // {"key": "value"}        "string": "string"  // {"key": "value"}    }}

A specific example for the requestMetadata field for a sample use case could be:

 "requestMetadata":{     "project": "CustomerService",     "intent": "BestPractices",     "priority": "Medium" }
    Select logs for distillation: When creating a model customization job, you can specify filters to select which invocation logs to use. The API supports various filtering options:
      Include specific logs:
       "requestMetadataFilters": {     "equals": {"project": "CustomerService"} }
      Exclude specific logs:
       "requestMetadataFilters": {     "notEquals": {"priority": "Low"} }
      Combine multiple conditions:
      "requestMetadataFilters": {    "andAll": [    {"equals": {"project": "CustomerService"}},    {"notEquals": {"priority": "Low"}}    ]}
      Use OR logic:
      "requestMetadataFilters": {    "orAll": [    {"equals": {"intent": "ComplaintResolution"}},    {"equals": {"intent": "ProductInquiry"}}    ]}

By following these steps, you can precisely control which data from your invocation logs should be used for distillation, enabling you to target specific use cases, projects, or workflows.

Selecting the right data

When selecting data for distillation, whether through a new training JSONL file or historical invocation logs, it’s crucial to choose prompts and responses that are relevant to your use case. The quality and diversity of the data will directly impact the performance of the distilled model.

In general, you should aim to include prompts that cover a wide range of topics and scenarios relevant to your use case, more importantly, a good approach also includes optimizing prompts for the teacher model to get better responses so distillation can perform high quality knowledge transfer from teacher to student. Specifically, for use cases like RAG, make sure to include prompts that contain relevant context to be used by the model. For tasks that require a specific response style or format, it’s important to include examples that adhere to the desired style or format.

Be mindful when curating the data used for distillation to help ensure that the distilled model learns the most relevant and valuable knowledge from the teacher model, optimizing its performance for your specific use case.

Run the model distillation

You can start a distillation job either through the Amazon Bedrock console or programmatically using the Amazon Bedrock API. The distillation process requires training data, either by uploading training data in JSONL format to Amazon S3, or by using historical model invocation logs, as we prepared in the prior section.

Before starting a model distillation job, make sure that you’re operating within the boundaries of Amazon Bedrock distillation service quotas.

Let’s explore how to start distillation jobs using different approaches. In the following example, we use Llama 3.1 70B as the teacher model and Llama 3.1 8B as student model.

Start a distillation job using the console

Amazon Bedrock Model Distillation provides you with an option to run a distillation job through a guided user interface in the console. To start a distillation job through the console, follow these steps:

    Go to the Amazon Bedrock console. Choose Foundation models in the navigation pane, then choose Custom models. In the Customization methods section, choose Create Distillation job.
    For Distilled model name, enter a name for the model. Select Model encryption to add a KMS key. Optionally, expand the Tags section to add tags for tracking.
    For Job name, enter a name for the training job. Optionally, expand the Tags section to add tags for tracking.
    Choose Select model to pick the teacher model of your choice.
    For Categories, choose Meta model family. For Models available for distillation, select Llama 3.1 70B Instruct. Choose Apply.
    Open the drop down under Select a student model. For this example, select Llama 3.1 8B Instruct.
    Specify the Max response length through the slider or directly in the input field. This configuration will be used as an inference parameter for the synthetic data generation by the teacher model.
    As discussed in the prior section, there are two approaches to provide a distillation input dataset.

      If you plan to directly upload JSONL file to S3, upload your training dataset to the S3 bucket you prepared in prerequisite section. Under Distillation input dataset, specify the Amazon S3 location for your training dataset. If you plan to use historical invocation logs, select Provide access to invocation logs first, then specify the S3 location for your stored invocation logs. You can add different types of metadata filters to select only the invocation logs relevant to the use case.

You can also configure Amazon Bedrock to only read your prompts or use the prompt-response pairs. If you chose to only read the prompts, Amazon Bedrock will regenerate responses from the teacher model; or if you choose to use prompt-response pairs, Amazon Bedrock will use the available response in logs without regenerating it.

Make sure that the teacher model selected for distillation and the model used in the invocation logs is the same if you want Amazon Bedrock to re-use the responses from invocation logs.

    Optionally, expand the VPC settings section to specify a VPC that defines the virtual networking environment for this distillation job.
    Under Distillation output metrics data, for S3 location, enter the S3 path for the bucket where you want the training output metrics of the distilled model to be stored.
    Under Service access, select a method to provide Amazon Bedrock with the required IAM permissions to perform the distillation. This happens through assignment of a service role. You can select Use an existing service role if you have already defined a role with fine-grained IAM policies. If you want a new role to be created, select Create and use a new service role and specify a Service role name. View permission details provides you with a comprehensive overview of IAM permissions required.
    After you have added all the required configurations for the Amazon Bedrock Model Distillation job, choose Create Distillation job.
    When the distillation job starts, you can see the status of the job (Training, Complete, or e) under Jobs.
    Now select your distillation job. As the distillation job progresses, you can find more information about the job, including job creation time, status, job duration, teacher-student configuration and the distillation input dataset.

Start a distillation job with S3 JSONL data using an API

To use an API to start a distillation job using training data stored in an S3 bucket, follow these steps:

    First, create and configure an Amazon Bedrock client:
    import boto3from datetime import datetime bedrock_client = boto3.client(service_name="bedrock") # Generate unique names for the job and modeljob_name = f"distillation-job-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"model_name = f"distilled-model-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}" # Configure your models and IAM roleteacher_model = "arn:aws:bedrock:us-west-2::foundation-model/meta.llama3-1-70b-instruct-v1:0"student_model = "arn:aws:bedrock:us-west-2::foundation-model/meta.llama3-1-8b-instruct-v1:0:128k"role_arn = "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<YOUR_IAM_ROLE>"# Specify S3 locations for training data and outputtraining_data = "s3://<YOUR_BUCKET>/training-data.jsonl" # Replace by your training fileoutput_path = "s3://<YOUR_BUCKET>/output/" # Specify MaxResponseLengthForInference parametermax_response_length = 1000
    Create the distillation job using create_model_customization_job:
    distillation_job_response = bedrock_client.create_model_customization_job(    jobName=job_name,    customModelName=model_name,    roleArn=role_arn,    baseModelIdentifier=student_model,    customizationType="DISTILLATION",    trainingDataConfig={        "s3Uri": training_data    },    outputDataConfig={        "s3Uri": output_path    },    customizationConfig={        "distillationConfig": {            "teacherModelConfig": {                "teacherModelIdentifier": teacher_model,                "maxResponseLengthForInference": max_response_length             }        }    })
    You can monitor the progress of distillation job by providing the job_arn of your model distillation job:
    response = bedrock_client.get_model_customization_job(    jobIdentifier=job_arn # Replace by your distillation job_arn)

Start a distillation job with an invocation log using an API

To use model invocation logs as training data, make sure that you have collected enough invocation logs in your S3 bucket. First, define the log filter based on the supporting logic referred to in the data preparation section:

# Configure the training data using invocation logstraining_data_config = {    'invocationLogsConfig': {        'usePromptResponse': False,        'invocationLogSource': {            's3Uri': 's3://<YOUR_BUCKET>/<BUCKET_PREFIX>/AWSLogs' # Replace by your S3 location        },        'requestMetadataFilters': {            'equals': {                'project': 'CustomerService'  # Filter logs based on metadata            }        }    }}

The invocationLogsConfig allows you to specify the Amazon S3 location where your invocation logs are stored, whether to use prompt-response pairs from the logs or generate new responses from the teacher model, and filters to select specific logs based on request metadata.

Then, create the distillation job using the same create_model_customization_job API (configuration parameters are defined as was done in the prior section):

distillation_job_response = bedrock_client.create_model_customization_job(    jobName=job_name,    customModelName=model_name,    roleArn=role_arn,    baseModelIdentifier=student_model,    customizationType="DISTILLATION",    trainingDataConfig=training_data_config,    outputDataConfig={        "s3Uri": output_path    },    customizationConfig={        "distillationConfig": {            "teacherModelConfig": {                "teacherModelIdentifier": teacher_model,                "maxResponseLengthForInference": max_response_length            }        }    })

Deploy and evaluate the model distillation

After distilling the model, you can evaluate the distillation metrics recorded during the process. These metrics are stored in the specified S3 bucket for evaluation purposes, which includes step-wise training metrics with columns step_number, epoch_number and training_loss.

When you’re satisfied with the distillation metrics, you can purchase a Provisioned Throughput to deploy your fine-tuned model, allowing you to take advantage of the improved performance and specialized capabilities of the distilled model in your applications. Provisioned throughput refers to the number and rate of inputs and outputs that a model processes and returns. To use a distilled model, you must purchase a Provisioned Throughput, which is billed hourly. The pricing for a Provisioned Throughput depends on the following factors:

After the Provisioned Throughput is set up, you can use the InvokeModel or Converse API to invoke the distilled model, similar to how the base model is invoked. This provides a seamless transition and maintains compatibility with existing applications or workflows.

It’s crucial to evaluate the performance of the distilled model to make sure that it meets the desired criteria and outperforms in specific tasks. You can conduct various evaluations, including comparing the distilled model with the teacher model to validate its performance.

Deploy the distilled model using the Amazon Bedrock console

To deploy the distilled model using the Amazon Bedrock console, complete the following steps:

    On the Amazon Bedrock console, choose Custom models in the navigation pane. Select the distilled model and choose Purchase provisioned throughput.
    For Provisioned throughput name, enter a name. Choose the model that you want to deploy. For Commitment term, select your level of commitment (for this post, we choose No commitment). Choose Purchase provisioned throughput.

After the distilled model has been deployed using a Provisioned Throughput, you can see the model status as In Service when you go to the Provisioned throughput page on the Amazon Bedrock console.

You can interact with this distilled model in Amazon Bedrock playground, select Chat/text, then select the distilled model in Custom & Managed endpoints.

Deploy the distilled model using the Amazon Bedrock API

To deploy the distilled model using the Amazon Bedrock API, complete the following steps:

    Retrieve the distilled model ID from the job’s output, and create a Provisioned Throughput model instance with the desired model units:
    import boto3bedrock_client = boto3.client(service_name="bedrock")job_arn = distillation_job_response['jobArn']custom_model_id = bedrock_client.get_model_customization_job(jobIdentifier=job_arn)['outputModelArn']provisioned_model_id = bedrock_client.create_provisioned_model_throughput(modelUnits=1, # Update model unites with desired numberprovisionedModelName='distilled-model',modelId=custom_model_id )['provisionedModelArn']
    Check the status of your Provisioned Throughput model by running:
    bedrock_client.get_provisioned_model_throughput(provisionedModelId=provisioned_model_id)['status']
    When the Provisioned Throughput model is ready, you can call the model by using the InvokeModel or Converse API to generate text using the distilled model:
    bedrock_runtime = boto3.client(service_name='bedrock-runtime')conversation = [         {        "role": "user",         "content": [{"text": <YOUR_INPUT_TEXT_PROMPT>}],         } ]inferenceConfig = {"maxTokens": 2048, "temperature": 0.1, "topP": 0.9}response = bedrock_runtime.converse(modelId=provisioned_model_id,                                              messages=conversation,                                              inferenceConfig=inferenceConfig,                                            )response_text = response["output"]["message"]["content"][0]["text"]

By following these steps, you can deploy and use your distilled model through Amazon Bedrock API, allowing you to generate an efficient and high-performing student model tailored to your use cases. After deploying the distilled model, you can use it for inference in various Amazon Bedrock services, including Knowledge Base inference, Playground, and any other service where custom models can be used for inference.

Conclusion

Amazon Bedrock Model Distillation enables you to create efficient, cost-optimized student models that closely match the performance of larger teacher models for specific use cases. By automating the complex process of knowledge transfer from advanced models to smaller models, Amazon Bedrock simplifies the deployment of faster and less expensive AI solutions without sacrificing accuracy. Customers can benefit from efficiency gains, ease of use, science innovation, and exclusive access to distill models across providers such as Anthropic and Amazon. With Amazon Bedrock Model Distillation, enterprises can use the power of foundation models while optimizing for latency, cost, and resource constraints to drive AI innovation across industries such as financial services, content moderation, healthcare, and customer service.

We encourage you to start your journey towards cost-effective AI innovation by visiting the Amazon Bedrock console and discovering how model distillation can transform your business.

For additional resources, see the following:


About the authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Aris Tsakpinis is a Specialist Solutions Architect for AI & Machine Learning with a special focus on natural language processing (NLP), large language models (LLMs), and generative AI. In his free time he is pursuing a PhD in ML Engineering at University of Regensburg, focussing on applied NLP in the science domain.

Shreeya Sharma  is a Senior Technical Product Manager at AWS, where she has been working on leveraging the power of Generative AI to deliver innovative and customer-centric products. Shreeya holds a master’s degree from Duke University. Outside of work, she loves traveling, dancing, and singing.

Sovik Kumar Nath is an AI/ML and Generative AI Senior Solutions Architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double master’s degrees from the University of South Florida and University of Fribourg, Switzerland, and a bachelor’s degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, and adventures.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock 模型蒸馏 生成式AI 模型定制 效率优化
相关文章