AWS Machine Learning Blog 2024年07月25日
LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了使用 Amazon SageMaker 和 MLflow 对大语言模型 (LLM) 进行微调,以适应特定任务或领域。文章详细介绍了两个客户旅程:选择和评估基础模型,以及为特定任务或领域调整微调 LLM。文章还介绍了使用 MLflow 和 SageMaker Pipelines 进行大规模微调和评估的解决方案。

😊 **选择和评估基础模型** 用户可以评估不同预训练基础模型 (FM) 在其用例的相关数据集和指标上的性能。然后,他们可以根据评估结果选择最佳模型。用户可以使用 Amazon SageMaker JumpStart 和 Amazon SageMaker Clarify 等服务来执行此操作。还可以像在 Operationalize LLM Evaluation at Scale using Amazon SageMaker Clarify and MLOps services 中解释的那样,在规模上进行此操作。下图说明了一个示例架构。 😊 **为特定任务或领域调整微调 LLM** 在此用户旅程中,用户需要为特定任务或领域数据定制 LLM。这需要对模型进行微调。微调过程可能涉及一个或多个实验,每个实验都需要使用数据集、超参数、提示和微调技术的不同组合进行多次迭代,例如完全微调或参数高效微调 (PEFT)。每次迭代都可以视为一个实验中的一个运行。 😊 **使用 MLflow 和 SageMaker Pipelines 进行大规模微调和评估** 运行数百个实验、比较结果并跟踪机器学习生命周期可能会变得非常复杂。这就是 MLflow 可以帮助简化机器学习生命周期的过程,从数据准备到模型部署。通过将 MLflow 集成到 LLM 工作流程中,用户可以有效地管理实验跟踪、模型版本控制和部署,从而提供可重复性。使用 MLflow,用户可以跟踪和比较多个 LLM 实验的性能,识别性能最佳的模型,并自信地将其部署到生产环境中。 用户可以使用 SageMaker Pipelines 创建工作流程,使他们能够使用每个步骤的简单 Python 代码来准备数据、微调模型和评估模型性能。 现在,用户可以使用 SageMaker 托管的 MLflow 在规模上运行 LLM 微调和评估实验。具体来说: * MLflow 可以管理微调实验的跟踪,比较不同运行的评估结果、模型版本控制、部署和配置(如数据和超参数) * SageMaker Pipelines 可以根据实验配置编排多个实验

Large language models (LLMs) have achieved remarkable success in various natural language processing (NLP) tasks, but they may not always generalize well to specific domains or tasks. You may need to customize an LLM to adapt to your unique use case, improving its performance on your specific dataset or task. You can customize the model using prompt engineering, Retrieval Augmented Generation (RAG), or fine-tuning. Evaluation of a customized LLM against the base LLM (or other models) is necessary to make sure the customization process has improved the model’s performance on your specific task or dataset.

In this post, we dive into LLM customization using fine-tuning, exploring the key considerations for successful experimentation and how Amazon SageMaker with MLflow can simplify the process using Amazon SageMaker Pipelines.

LLM selection and fine-tuning journeys

When working with LLMs, customers often have different requirements. Some may be interested in evaluating and selecting the most suitable pre-trained foundation model (FM) for their use case, while others might need to fine-tune an existing model to adapt it to a specific task or domain. Let’s explore two customer journeys:

Fine-tuning an LLM can be a complex workflow for data scientists and machine learning (ML) engineers to operationalize. To simplify this process, you can use Amazon SageMaker with MLflow and SageMaker Pipelines for fine-tuning and evaluation at scale. In this post, we describe the step-by-step solution and provide the source code in the accompanying GitHub repository.

Solution overview

Running hundreds of experiments, comparing the results, and keeping a track of the ML lifecycle can become very complex. This is where MLflow can help streamline the ML lifecycle, from data preparation to model deployment. By integrating MLflow into your LLM workflow, you can efficiently manage experiment tracking, model versioning, and deployment, providing reproducibility. With MLflow, you can track and compare the performance of multiple LLM experiments, identify the best-performing models, and deploy them to production environments with confidence.

You can create workflows with SageMaker Pipelines that enable you to prepare data, fine-tune models, and evaluate model performance with simple Python code for each step.

Now you can use SageMaker managed MLflow to run LLM fine-tuning and evaluation experiments at scale. Specifically:

The following figure shows the overview of the solution.

Prerequisites

Before you begin, make sure you have the following prerequisites in place:

Set up an MLflow tracking server

MLflow is directly integrated in Amazon SageMaker Studio. To create an MLflow tracking server to track experiments and runs, complete the following steps:

    On the SageMaker Studio console, choose MLflow under Applications in the navigation pane.

    For Name, enter an appropriate server name. For Artifact storage location (S3 URI), enter the location of an Amazon Simple Storage Service (Amazon S3) bucket. Choose Create.

The tracking server may require up to 20 minutes to initialize and become operational. When it’s running, you can note its ARN to use in the llm_fine_tuning_experiments_mlflow.ipynb notebook. The ARN will have the following format:

arn:aws:sagemaker:<region>:<account_id>:mlflow-tracking-server/<tracking_server_name>

For subsequent steps, you can refer to the detailed description provided in this post, as well as the step-by-step instructions outlined in the llm_fine_tuning_experiments_mlflow.ipynb notebook. You can Launch the notebook in Amazon SageMaker Studio Classic or SageMaker JupyterLab.

Overview of SageMaker Pipelines for experimentation at scale

We use SageMaker Pipelines to orchestrate LLM fine-tuning and evaluation experiments. With SageMaker Pipelines, you can:

MLflow integration with SageMaker Pipelines requires the tracking server ARN. You also need to add the mlflow and sagemaker-mlflow Python packages as dependencies in the pipeline setup. Then you can use MLflow in any pipeline step with the following code snippet:

mlflow_arn="" #get the tracking ARN from step 1experiment_name="" #experiment name of your choicemlflow.set_tracking_uri(mlflow_arn)mlflow.set_experiment(experiment_name)with mlflow.start_run(run_name=run_name) as run:        #code for the corresponding step

Log datasets with MLflow

With MLflow, you can log your dataset information alongside other key metrics, such as hyperparameters and model evaluation. This enables tracking and reproducibility of experiments across different runs, allowing for more informed decision-making about which models perform best on specific tasks or domains. By logging your datasets with MLflow, you can store metadata, such as dataset descriptions, version numbers, and data statistics, alongside your MLflow runs.

In the preproccess step, you can log training data and evaluation data. In this example, we download the data from a Hugging Face dataset. We are using HuggingFaceH4/no_robots for fine-tuning and evaluation. First, you need to set the MLflow tracking ARN and experiment name to log data. After you process the data and select the required number of rows, you can log the data using the log_input API of MLflow. See the following code:

mlflow.set_tracking_uri(mlflow_arn)mlflow.set_experiment(experiment_name)    dataset = load_dataset(dataset_name, split="train")# Data processing implementation# Data logging with MLflowdf_train = pd.DataFrame(dataset)training_data = mlflow.data.from_pandas(df_train, source=training_input_path)mlflow.log_input(training_data, context="training")      df_evaluate = pd.DataFrame(eval_dataset)evaluation_data = mlflow.data.from_pandas(df_evaluate, source=eval_input_path)mlflow.log_input(evaluation_data, context="evaluation")

Fine-tune a Llama model with LoRA and MLflow

To streamline the process of fine-tuning LLM with Low-Rank Adaption (LoRA), you can use MLflow to track hyperparameters and save the resulting model. You can experiment with different LoRA parameters for training and log these parameters along with other key metrics, such as training loss and evaluation metrics. This enables tracking of your fine-tuning process, allowing you to identify the most effective LoRA parameters for a given dataset and task.

For this example, we use the PEFT library from Hugging Face to fine-tune a Llama 3 model. With this library, we can perform LoRA fine-tuning, which offers faster training with reduced memory requirements. It can also work well with less training data.

We use the HuggingFace class from the SageMaker SDK to create a training step in SageMaker Pipelines. The actual implementation of training is defined in llama3_fine_tuning.py. Just like the previous step, we need to set the MLflow tracking URI and use the same run_id:

mlflow.set_tracking_uri(args.mlflow_arn)mlflow.set_experiment(args.experiment_name)with mlflow.start_run(run_id=args.run_id) as run:# implementation

While using the Trainer class from Transformers, you can mention where you want to report the training arguments. In our case, we want to log all the training arguments to MLflow:

trainer = transformers.Trainer(        model=model,        train_dataset=lm_train_dataset,        eval_dataset=lm_test_dataset,        args=transformers.TrainingArguments(            per_device_train_batch_size=per_device_train_batch_size,            per_device_eval_batch_size=per_device_eval_batch_size,            gradient_accumulation_steps=gradient_accumulation_steps,            gradient_checkpointing=gradient_checkpointing,            logging_steps=2,            num_train_epochs=num_train_epochs,            learning_rate=learning_rate,            bf16=True,            save_strategy="no",            output_dir="outputs",            report_to="mlflow",            run_name="llama3-peft",        ),        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),    )

When the training is complete, you can save the full model, so you need to merge the adapter weights to the base model:

model = PeftModel.from_pretrained(base_model, new_model)model = model.merge_and_unload()save_dir = "/opt/ml/model/"model.save_pretrained(save_dir, safe_serialization=True, max_shard_size="2GB")# Reload tokenizer to save ittokenizer = AutoTokenizer.from_pretrained(args.model_id, trust_remote_code=True)tokenizer.pad_token = tokenizer.eos_tokentokenizer.padding_side = "right"tokenizer.save_pretrained(save_dir)

The merged model can be logged to MLflow with the model signature, which defines the expected format for model inputs and outputs, including any additional parameters needed for inference:

params = {        "top_p": 0.9,        "temperature": 0.9,        "max_new_tokens": 200,    }signature = infer_signature("inputs","generated_text", params=params)mlflow.transformers.log_model(    transformers_model={"model": model, "tokenizer": tokenizer},    signature=signature,    artifact_path="model",     model_config = params)

Evaluate the model

Model evaluation is the key step to select the most optimal training arguments for fine-tuning the LLM for a given dataset. In this example, we use the built-in evaluation capability of MLflow with the mlflow.evaluate() API. For question answering models, we use the default evaluator logs exact_match, token_count, toxicity, flesch_kincaid_grade_level, and ari_grade_level.

MLflow can load the model that was logged in the fine-tuning step. The base model is downloaded from Hugging Face and adapter weights are downloaded from the logged model. See the following code:

logged_model = f"runs:/{preprocess_step_ret['run_id']}/model"loaded_model = mlflow.pyfunc.load_model(model_uri=logged_model)results = mlflow.evaluate(    model=loaded_model,    data=df,    targets="answer",    model_type="question-answering",    evaluator_config={"col_mapping": {"inputs": "question"}},)

These evaluation results are logged in MLflow in the same run that logged the data processing and fine-tuning step.

Create the pipeline

After you have the code ready for all the steps, you can create the pipeline:

from sagemaker import get_execution_rolepipeline = Pipeline(name=pipeline_name, steps=[evaluate_finetuned_llama7b_instruction_mlflow], parameters=[lora_config])

You can run the pipeline using the SageMaker Studio UI or using the following code snippet in the notebook:

execution1 = pipeline.start()

Compare experiment results

After you start the pipeline, you can track the experiment in MLflow. Each run will log details of the preprocessing, fine-tuning, and evaluation steps. The preprocessing step will log training and evaluation data, and the fine-tuning step will log all training arguments and LoRA parameters. You can select these experiments and compare the results to find the optimal training parameters and best fine-tuned model.

You can open the MLflow UI from SageMaker Studio.

Then you can select the experiment to filter out runs for that experiment. You can select multiple runs to make the comparison.

When you compare, you can analyze the evaluation score against the training arguments.

Register the model

After you analyze the evaluation results of different fine-tuned models, you can select the best model and register it in MLflow. This model will be automatically synced with Amazon SageMaker Model Registry.

Deploy the model

You can deploy the model through the SageMaker console or SageMaker SDK. You can pull the model artifact from MLflow and use the ModelBuilder class to deploy the model:

from sagemaker.serve import ModelBuilderfrom sagemaker.serve.mode.function_pointers import Modefrom sagemaker.serve import SchemaBuildermodel_builder = ModelBuilder(    mode=Mode.SAGEMAKER_ENDPOINT,    role_arn="<role_arn>",    model_metadata={        # both model path and tracking server ARN are required if you use an mlflow run ID or mlflow model registry path as input        "MLFLOW_MODEL_PATH": "runs:/<run_id>/model",        "MLFLOW_TRACKING_ARN": "<MLFLOW_TRACKING_ARN>",    },    instance_type="ml.g5.12xlarge")model = model_builder.build()predictor = model.deploy( initial_instance_count=1, instance_type="ml.g5.12xlarge" )

Clean up

In order to not incur ongoing costs, delete the resources you created as part of this post:

    Delete the MLflow tracking server. Run the last cell in the notebook to delete the SageMaker pipeline:
sagemaker_client = boto3.client('sagemaker')response = sagemaker_client.delete_pipeline(    PipelineName=pipeline_name,)

Conclusion

In this post, we focused on how to run LLM fine-tuning and evaluation experiments at scale using SageMaker Pipelines and MLflow. You can use managed MLflow from SageMaker to compare training parameters and evaluation results to select the best model and deploy that model in SageMaker. We also provided sample code in a GitHub repository that shows the fine-tuning, evaluation, and deployment workflow for a Llama3 model.

You can start taking advantage of SageMaker with MLflow for traditional MLOps or to run LLM experimentation at scale.


About the Authors

Jagdeep Singh Soni is a Senior Partner Solutions Architect at AWS based in the Netherlands. He uses his passion for Generative AI to help customers and partners build GenAI applications using AWS services. Jagdeep has 15 years of experience in innovation, experience engineering, digital transformation, cloud architecture and ML applications.

Dr. Sokratis Kartakis is a Principal Machine Learning and Operations Specialist Solutions Architect for Amazon Web Services. Sokratis focuses on enabling enterprise customers to industrialize their ML and generative AI solutions by exploiting AWS services and shaping their operating model, such as MLOps/FMOps/LLMOps foundations, and transformation roadmap using best development practices. He has spent over 15 years inventing, designing, leading, and implementing innovative end-to-end production-level ML and AI solutions in the domains of energy, retail, health, finance, motorsports, and more.

Kirit Thadaka is a Senior Product Manager at AWS focused on generative AI experimentation on Amazon SageMaker. Kirit has extensive experience working with customers to build scalable workflows for MLOps to make them more efficient at bringing models to production.

Piyush Kadam is a Senior Product Manager for Amazon SageMaker, a fully managed service for generative AI builders. Piyush has extensive experience delivering products that help startups and enterprise customers harness the power of foundation models.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 微调 Amazon SageMaker MLflow SageMaker Pipelines
相关文章