AWS Machine Learning Blog 02月25日
Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Mistral AI的Mistral-Small-24B-Instruct-2501模型是一款专为低延迟文本生成任务优化的大型语言模型。该模型现已通过Amazon SageMaker JumpStart和Amazon Bedrock Marketplace面向客户开放。Amazon Bedrock Marketplace提供超过100种流行的、新兴的和专业的预训练模型。Mistral Small 3 (2501) 在性能和计算效率之间取得了平衡,它在代码、数学、通用知识和指令遵循方面表现出色,适用于需要快速、准确响应的场景,如虚拟助手、客户服务自动化和内容审核等。

🚀 **Mistral Small 3 (2501)概述**: Mistral-Small-24B-Instruct-2501模型是一个240亿参数的大型语言模型,由Mistral AI发布,可在SageMaker JumpStart和Bedrock Marketplace上部署和使用,该模型在Apache 2.0许可下发布,提供预训练和指令调整两种版本。

💡 **性能指标与基准**: 该模型的指令调整版本在Massive Multitask Language Understanding (MMLU) 上实现了超过81%的准确率,延迟为每秒150个tokens,据Mistral声称,其性能与Llama 3.3 70B instruct相当,但在相同硬件上的速度快三倍以上。

🛒 **Amazon Bedrock Marketplace**: 开发者可以通过Amazon Bedrock Marketplace发现、测试和使用超过100种预训练模型,包括Mistral-Small-24B-Instruct-2501,并可以通过筛选器按提供商、模态或任务进行搜索,模型详情页面提供关于模型功能、定价结构和实施指南等重要信息。

🛠️ **SageMaker JumpStart集成**: SageMaker JumpStart提供了一系列预训练模型,包括Mistral系列,加速了ML应用程序的开发和部署,用户可以通过SageMaker Studio UI或SageMaker Python SDK发现和部署Mistral模型,并利用SageMaker Pipelines和Debugger等功能。

⚙️ **部署步骤**: 在Amazon Bedrock Marketplace中,用户需要配置部署详情,包括终端节点名称、实例数量和实例类型,建议使用GPU实例类型以获得最佳性能,而在SageMaker JumpStart中,用户可以通过SageMaker Studio UI或SDK选择模型并进行部署。

Today, we’re excited to announce that Mistral-Small-24B-Instruct-2501—a twenty-four billion parameter large language model (LLM) from Mistral AI that’s optimized for low latency text generation tasks—is available for customers through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace. Amazon Bedrock Marketplace is a new capability in Amazon Bedrock that developers can use to discover, test, and use over 100 popular, emerging, and specialized foundation models (FMs) alongside the current selection of industry-leading models in Amazon Bedrock. These models are in addition to the industry-leading models that are already available on Amazon Bedrock. You can also use this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for running inference. In this post, we walk through how to discover, deploy, and use Mistral-Small-24B-Instruct-2501.

Overview of Mistral Small 3 (2501)

Mistral Small 3 (2501), a latency-optimized 24B-parameter model released under Apache 2.0 maintains a balance between performance and computational efficiency. Mistral offers both the pretrained (Mistral-Small-24B-Base-2501) and instruction-tuned (Mistral-Small-24B-Instruct-2501) checkpoints of the model under Apache 2.0. Mistral Small 3 (2501) features a 32 k token context window. According to Mistral, the model demonstrates strong performance in code, math, general knowledge, and instruction following compared to its peers. Mistral Small 3 (2501) is designed for the 80% of generative AI tasks that require robust language and instruction following performance with very low latency. The instruction-tuning process is focused on improving the model’s ability to follow complex directions, maintain coherent conversations, and generate accurate, context-aware responses. The 2501 version follows previous iterations (Mistral-Small-2409 and Mistral-Small-2402) released in 2024, incorporating improvements in instruction-following and reliability. Currently, the instruct version of this model, Mistral-Small-24B-Instruct-2501 is available for customers to deploy and use on SageMaker JumpStart and Bedrock Marketplace.

Optimized for conversational assistance

Mistral Small 3 (2501) excels in scenarios where quick, accurate responses are critical, such as in virtual assistants. This includes virtual assistants where users expect immediate feedback and near real-time interactions. Mistral Small 3 (2501) can handle rapid function execution when used as part of automated or agentic workflows. The architecture is designed to typically respond in less than 100 milliseconds, according to Mistral, making it ideal for customer service automation, interactive assistance, live chat, and content moderation.

Performance metrics and benchmarks

According to Mistral, the instruction-tuned version of the model achieves over 81% accuracy on Massive Multitask Language Understanding (MMLU) with 150 tokens per second latency, making it currently the most efficient model in its category. In third-party evaluations conducted by Mistral, the model demonstrates competitive performance against larger models such as Llama 3.3 70B and Qwen 32B. Notably, Mistral claims that the model performs at the same level as Llama 3.3 70B instruct and is more than three times faster on the same hardware.

SageMaker JumpStart overview

SageMaker JumpStart is a fully managed service that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is model hubs, which offer a vast catalog of pre-trained models, such as Mistral, for a variety of tasks.

You can now discover and deploy Mistral models in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with Amazon SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in a secure AWS environment and under your VPC controls, helping to support data security for enterprise security needs.

Prerequisites

To try Mistral-Small-24B-Instruct-2501 in SageMaker JumpStart, you need the following prerequisites:

Amazon Bedrock Marketplace overview

To get started, in the AWS Management Console for Amazon Bedrock, select Model catalog in the Foundation models section of the navigation pane. Here, you can search for models that help you with a specific use case or language. The results of the search include both serverless models and models available in Amazon Bedrock Marketplace. You can filter results by provider, modality (such as text, image, or audio), or task (such as classification or text summarization).

Deploy Mistral-Small-24B-Instruct-2501 in Amazon Bedrock Marketplace

To access Mistral-Small-24B-Instruct-2501 in Amazon Bedrock, complete the following steps:

    On the Amazon Bedrock console, select Model catalog under Foundation models in the navigation pane.

At the time of writing this post, you can use the InvokeModel API to invoke the model. It doesn’t support Converse APIs or other Amazon Bedrock tooling.

    Filter for Mistral as a provider and select the Mistral-Small-24B-Instruct-2501

The model detail page provides essential information about the model’s capabilities, pricing structure, and implementation guidelines. You can find detailed usage instructions, including sample API calls and code snippets for integration.

The page also includes deployment options and licensing information to help you get started with Mistral-Small-24B-Instruct-2501 in your applications.

    To begin using Mistral-Small-24B-Instruct-2501, choose Deploy.
    You will be prompted to configure the deployment details for Mistral-Small-24B-Instruct-2501. The model ID will be pre-populated.
      For Endpoint name, enter an endpoint name (up to 50 alphanumeric characters). For Number of instances, enter a number between 1and 100. For Instance type, select your instance type. For optimal performance with Mistral-Small-24B-Instruct-2501, a GPU-based instance type such as ml.g6.12xlarge is recommended. Optionally, you can configure advanced security and infrastructure settings, including virtual private cloud (VPC) networking, service role permissions, and encryption settings. For most use cases, the default settings will work well. However, for production deployments, you might want to review these settings to align with your organization’s security and compliance requirements.
    Choose Deploy to begin using the model.

When the deployment is complete, you can test Mistral-Small-24B-Instruct-2501 capabilities directly in the Amazon Bedrock playground.

    Choose Open in playground to access an interactive interface where you can experiment with different prompts and adjust model parameters such as temperature and maximum length.

When using Mistral-Small-24B-Instruct-2501 with the Amazon Bedrock InvokeModel and Playground console, use DeepSeek’s chat template for optimal results. For example, <|begin▁of▁sentence|><|User|>content for inference<|Assistant|>.

This is an excellent way to explore the model’s reasoning and text generation abilities before integrating it into your applications. The playground provides immediate feedback, helping you understand how the model responds to various inputs and letting you fine-tune your prompts for optimal results.

You can quickly test the model in the playground through the UI. However, to invoke the deployed model programmatically with Amazon Bedrock APIs, you need to get the endpoint Amazon Resource Name (ARN).

Discover Mistral-Small-24B-Instruct-2501 in SageMaker JumpStart

You can access Mistral-Small-24B-Instruct-2501 through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform ML development steps, from preparing data to building, training, and deploying your ML models. For more information about how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

    In the SageMaker Studio console, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
    Select HuggingFace. From the SageMaker JumpStart landing page, search for Mistral-Small-24B-Instruct-2501 using the search box.
    Select a model card to view details about the model such as license, data used to train, and how to use the model. Choose Deploy to deploy the model and create an endpoint.

Deploy Mistral-Small-24B-Instruct-2501 with the SageMaker SDK

Deployment starts when you choose Deploy. After deployment finishes, you will see that an endpoint is created. Test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.

    To deploy using the SDK, start by selecting the Mistral-Small-24B-Instruct-2501 model, specified by the model_id with the value mistral-small-24B-instruct-2501. You can deploy your choice of the selected models on SageMaker using the following code. Similarly, you can deploy Mistral-Small-24b-Instruct-2501 using its model ID.
    from sagemaker.jumpstart.model import JumpStartModel accept_eula = True model = JumpStartModel(model_id="huggingface-llm-mistral-small-24b-instruct-2501") predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The EULA value must be explicitly defined as True to accept the end-user license agreement (EULA). See AWS service quotas for how to request a service quota increase.

    After the model is deployed, you can run inference against the deployed endpoint through the SageMaker predictor:
    prompt = "Hello!"payload = {    "messages": [        {            "role": "user",            "content": prompt        }    ],    "max_tokens": 4000,    "temperature": 0.1,    "top_p": 0.9,}    response = predictor.predict(payload)print(response['choices'][0]['message']['content']) 

Retail math example

Here’s an example of how Mistral-Small-24B-Instruct-2501 can break down a common shopping scenario. In this case, you ask the model to calculate the final price of a shirt after applying multiple discounts—a situation many of us face while shopping. Notice how the model provides a clear, step-by-step solution to follow.

prompt = "A store is having a 20% off sale, and you have an additional 10% off coupon. If you buy a shirt that originally costs $50, how much will you pay?"payload = {    "messages": [        {            "role": "user",            "content": prompt        }    ],    "max_tokens": 1000,    "temperature": 0.1,    "top_p": 0.9,}    response = predictor.predict(payload)print(response['choices'][0]['message']['content']) 

The following is the output:

First, we'll apply the 20% off sale discount to the original price of the shirt.20% of $50 is calculated as:0.20 * $50 = $10So, the price after the 20% discount is:$50 - $10 = $40Next, we'll apply the additional 10% off coupon to the new price of $40.10% of $40 is calculated as:0.10 * $40 = $4So, the price after the additional 10% discount is:$40 - $4 = $36Therefore, you will pay $36 for the shirt.

The response shows clear step-by-step reasoning without introducing incorrect information or hallucinated facts. Each mathematical step is explicitly shown, making it simple to verify the accuracy of the calculations.

Clean up

To avoid unwanted charges, complete the following steps in this section to clean up your resources.

Delete the Amazon Bedrock Marketplace deployment

If you deployed the model using Amazon Bedrock Marketplace, complete the following steps:

    On the Amazon Bedrock console, under Foundation models in the navigation pane, select Marketplace deployments. In the Managed deployments section, locate the endpoint you want to delete. Select the endpoint, and on the Actions menu, select Delete. Verify the endpoint details to make sure you’re deleting the correct deployment:
      Endpoint name Model name Endpoint status
    Choose Delete to delete the endpoint. In the deletion confirmation dialog, review the warning message, enter confirm, and choose Delete to permanently remove the endpoint.

Delete the SageMaker JumpStart predictor

After you’re done running the notebook, make sure to delete all resources that you created in the process to avoid additional billing. For more details, see Delete Endpoints and Resources.

predictor.delete_model()predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Mistral-Small-24B-Instruct-2501 in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

For more Mistral resources on AWS, check out the Mistral-on-AWS GitHub repo.


About the Authors

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s degree in Computer Science and Bioinformatics.

Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.

Shane Rai is a Principal Generative AI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the breadth of cloud-based AI/ML services offered by AWS, including model offerings from top tier foundation model providers.

Avan Bala is a Solutions Architect at AWS. His area of focus is AI for DevOps and machine learning. He holds a bachelor’s degree in Computer Science with a minor in Mathematics and Statistics from the University of Maryland. Avan is currently working with the Enterprise Engaged East Team and likes to specialize in projects about emerging AI technologies.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, the machine learning and generative AI hub provided by SageMaker. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Mistral AI LLM Amazon SageMaker Amazon Bedrock 低延迟
相关文章