Automate building guardrails for Amazon Bedrock using test-driven development

As companies of all sizes continue to build generative AI applications, the need for robust governance and control mechanisms becomes crucial. With the growing complexity of generative AI models, organizations face challenges in maintaining compliance, mitigating risks, and upholding ethical standards. This is where the concept of guardrails comes into play, providing a comprehensive framework for implementing governance and control measures with safeguards customized to your application requirements and responsible AI policies.

Amazon Bedrock Guardrails helps implement safeguards for generative AI applications based on specific use cases and responsible AI policies. Amazon Bedrock Guardrails assists in controlling the interaction between users and foundation models (FMs) by detecting and filtering out undesirable and potentially harmful content, while maintaining safety and privacy. Organizations can define denied topics, making sure that FMs refrain from providing information or advice on undesirable subjects; configure content filters to set thresholds for blocking harmful content across categories such as hate, insults, sexual, violence, and misconduct; redact sensitive and personally identifiable information (PII) to protect privacy; and block inappropriate content with a custom word filter. You can create multiple guardrails with different configurations, each tailored to specific use cases, and continuously monitor and analyze user inputs and FM responses that might violate customer-defined policies. By proactively implementing guardrails, companies can future-proof their generative AI applications while maintaining a steadfast commitment to ethical and responsible AI practices.

In this post, we explore a solution that automates building guardrails using a test-driven development approach.

Iterative development

Although implementing Amazon Bedrock Guardrails is a crucial step in practicing responsible AI, it’s important to recognize that these safeguards aren’t static. As models evolve and new use cases emerge, organizations must be proactive in refining and adapting their guardrails to maintain effectiveness and alignment with their responsible AI policies.

To address this challenge, we recommend builders adopt a test-driven development (TDD) approach when building and maintaining their guardrails. TDD is a software development methodology that emphasizes writing tests before implementing actual code. By applying this methodology to guardrails, organizations can proactively identify edge cases, potential vulnerabilities, and areas for improvement, making sure that their guardrails remain robust and fit for purpose. TDD for guardrails offers several benefits. It promotes a structured and systematic approach to refining and validating guardrails, reducing the risk of unintended consequences or gaps in coverage. Additionally, TDD facilitates collaboration and knowledge sharing among teams, because tests serve as living documentation and a shared understanding of the expected behavior and constraints.

In this post, we present a solution that takes a TDD approach to guardrail development, allowing you to improve your guardrails over time.

Solution overview

In this solution, you use a TDD approach to improve your guardrails. You first create a guardrail, then build a testing dataset, and finally evaluate the guardrail using the testing dataset. Using the test results from your evaluation of the guardrail, you can go back and update it and reevaluate. This allows you to maintain the TDD approach and improve your guardrail over multiple iterations. The solution also includes an optional step where you invoke an FM to generate and implement changes to your guardrail based on the test results; we recommend using that step to understand the different ways to update the guardrail because it doesn’t guarantee all test cases will pass.

This workflow is shown in the following diagram.

This diagram presents the main workflow (Steps 1–4) and the optional automated workflow (Steps 5–7).

Prerequisites

Before you start, make sure you have the following prerequisites in place:

Create an AWS account

AWS Identity and Access Management (IAM)

permissions to use Amazon Bedrock

access to the large language model (LLM)

Install

Configure

Clone the repo

To get started, clone the repository by running the following command, and then switch to the working directory:

git clone https://github.com/aws-samples/amazon-bedrock-samples/responsible-ai/tdd-guardrail

Build your guardrail

To build the guardrail, you can use the CreateGuardrail API. There are multiple components to a guardrail for Amazon Bedrock. This API allows you to configure the following policies programmatically:

Content filters

Denied topics

Word filters

Sensitive information filters

Contextual grounding check

To test this solution, you create a guardrail for a math tutoring business, which stops the model from providing responses for non-math tutoring, in-person tutoring, or tutoring outside grades 6–12 requests. See the following code:

create_response = client.create_guardrail(    name='math-tutoring-guardrail',    description='Prevents the model from providing non-math tutoring, in-person tutoring, or tutoring outside grades 6-12.',    topicPolicyConfig={        'topicsConfig': [            {                'name': 'In-Person Tutoring',                'definition': 'Requests for face-to-face, physical tutoring sessions.',                'examples': [                    'Can you tutor me in person?',                    'Do you offer home tutoring visits?',                    'I need a tutor to come to my house.'                ],                'type': 'DENY'            },            {                'name': 'Non-Math Tutoring',                'definition': 'Requests for tutoring in subjects other than mathematics.',                'examples': [                    'Can you help me with my English homework?',                    'I need a science tutor.',                    'Do you offer history tutoring?'                ],                'type': 'DENY'            },            {                'name': 'Non-6-12 Grade Tutoring',                'definition': 'Requests for tutoring students outside of grades 6-12.',                'examples': [                    'Can you tutor my 5-year-old in math?',                    'I need help with college-level calculus.',                    'Do you offer math tutoring for adults?'                ],                'type': 'DENY'            }        ]    },    contentPolicyConfig={        'filtersConfig': [            {                'type': 'SEXUAL',                'inputStrength': 'HIGH',                'outputStrength': 'HIGH'            },            {                'type': 'VIOLENCE',                'inputStrength': 'HIGH',                'outputStrength': 'HIGH'            },            {                'type': 'HATE',                'inputStrength': 'HIGH',                'outputStrength': 'HIGH'            },            {                'type': 'INSULTS',                'inputStrength': 'HIGH',                'outputStrength': 'HIGH'            },            {                'type': 'MISCONDUCT',                'inputStrength': 'HIGH',                'outputStrength': 'HIGH'            },            {                'type': 'PROMPT_ATTACK',                'inputStrength': 'HIGH',                'outputStrength': 'NONE'            }        ]    },    wordPolicyConfig={        'wordsConfig': [            {'text': 'in-person tutoring'},            {'text': 'home tutoring'},            {'text': 'face-to-face tutoring'},            {'text': 'elementary school'},            {'text': 'college'},            {'text': 'university'},            {'text': 'adult education'},            {'text': 'english tutoring'},            {'text': 'science tutoring'},            {'text': 'history tutoring'}        ],        'managedWordListsConfig': [            {'type': 'PROFANITY'}        ]    },    sensitiveInformationPolicyConfig={        'piiEntitiesConfig': [            {'type': 'EMAIL', 'action': 'ANONYMIZE'},            {'type': 'PHONE', 'action': 'ANONYMIZE'},            {'type': 'NAME', 'action': 'ANONYMIZE'}        ]    },    blockedInputMessaging="""I'm sorry, but I can only assist with math tutoring for students in grades 6-12. For other subjects, grade levels, or in-person tutoring, please contact our customer service team for more information on available services.""",    blockedOutputsMessaging="""I apologize, but I can only provide information and assistance related to math tutoring for students in grades 6-12. If you have any questions about our online math tutoring services for these grade levels, please feel free to ask.""",    tags=[        {'key': 'purpose', 'value': 'math-tutoring-guardrail'},        {'key': 'environment', 'value': 'production'}    ])

The API response will include a guardrail ID and version. You use these two fields to interact with the guardrail in the following sections.

Build the testing dataset

The tests.csv file in the project directory consists of a testing dataset for the math-tutoring-guardrail created in the previous step. Upload your own dataset to the data folder in the project directory as a CSV file following the same structure as the sample tests.csv file based on your specific use case. The dataset must contain the following columns:

test_number

test_type

INPUT

OUTPUT

test_content_query

test_content_grounding_source

test_content_guard_content

OUTPUT

expected_action

GUARDRAIL_INTERVENED

NONE

GUARDRAIL_INTERVENED

NONE

Make sure your test dataset is comprehensively testing all the elements of your guardrail system. You load the tests file into the workflow using the pandas library in Python. Using df.head(), you can see the first five rows of the pandas dataframe object and verify that the dataset has been read correctly:

# Import the data fileimport pandas as pddf = pd.read_csv('data/tests.csv')df.head()

Evaluate the guardrail with the testing dataset

To run the tests on the created guardrail, use the ApplyGuardrails API. This applies the guardrail for model input or model response output text without needing to invoke the FM.

The ApplyGuardrail API requires the following:

Guardrail identifier

Guardrail version

Source

INPUT

OUTPUT

Content

We use the guardrail ID and version from the CreateGuardrail API response. The source and content will be extracted from the tests CSV created in the previous step. The following code reads through your CSV file and prepares the source and content for the ApplyGuardrails API call:

with open(input_file, 'r') as infile, open(output_file, 'w', newline='') as outfile:        reader = csv.DictReader(infile)        fieldnames = reader.fieldnames + ['test_result', 'achieved_expected_result', 'guardrail_api_response']        writer = csv.DictWriter(outfile, fieldnames=fieldnames)        writer.writeheader()        for row_number, row in enumerate(reader, start=1):            content = []            if row['test_type'] == 'INPUT':                content = [{"text": {"text": row['test_content_query']}}]            elif row['test_type'] == 'OUTPUT':                content = [                    {"text": {"text": row['test_content_grounding_source'], "qualifiers": ["grounding_source"]}},                    {"text": {"text": row['test_content_query'], "qualifiers": ["query"]}},                    {"text": {"text": row['test_content_guard_content'], "qualifiers": ["guard_content"]}},                ]                        # Remove empty content items            content = [item for item in content if item['text']['text']]

You can call the ApplyGuardrail API for each row in the testing dataset. Based on the API response, you can determine the guardrail’s action. If the guardrail’s action matches the expected action, the test is considered True (passed), otherwise False (failed). Additionally, each row of the API response is saved so the user can explore the response as needed. These test results will then be written to an output CSV file. See the following code:

with open(input_file, 'r') as infile, open(output_file, 'w', newline='') as outfile:        reader = csv.DictReader(infile)        fieldnames = reader.fieldnames + ['test_result', 'achieved_expected_result', 'guardrail_api_response']        writer = csv.DictWriter(outfile, fieldnames=fieldnames)        writer.writeheader()        for row_number, row in enumerate(reader, start=1):            content = []            if row['test_type'] == 'INPUT':                content = [{"text": {"text": row['test_content_query']}}]            elif row['test_type'] == 'OUTPUT':                content = [                    {"text": {"text": row['test_content_grounding_source'], "qualifiers": ["grounding_source"]}},                    {"text": {"text": row['test_content_query'], "qualifiers": ["query"]}},                    {"text": {"text": row['test_content_guard_content'], "qualifiers": ["guard_content"]}},                ]                        # Remove empty content items            content = [item for item in content if item['text']['text']]            # Make the actual API call            response = apply_guardrail(content, row['test_type'], guardrail_id, guardrail_version)            if response:                actual_action = response.get('action', 'NONE')                expected_action = row['expected_action']                achieved_expected = actual_action == expected_action                # Prepare the API response for CSV                api_response = json.dumps( {                    "action": actual_action,                    "outputs": response.get('outputs', []),                    "assessments": response.get('assessments', [])                })                # Write the results                row.update({                    'test_result': actual_action,                    'achieved_expected_result': str(achieved_expected).upper(),                    'guardrail_api_response': api_response                })            else:                # Handle the case where the API call failed                row.update({                    'test_result': 'API_CALL_FAILED',                    'achieved_expected_result': 'FALSE',                    'guardrail_api_response': json.dumps({"error": "API call failed"})                })            writer.writerow(row)            print(f"Processed row {row_number}")  # New line to print progress    print(f"Processing complete. Results written to {output_file}")

After reviewing the test results, you can update the guardrail as required to help meet your applications needs. This approach allows you to practice TDD when working with Amazon Bedrock Guardrails. In the following table, you can see tests that failed, which resulted in the achieved_expected_result being FALSE because the guardrail intervened when it shouldn’t have. Therefore, we can modify the denied topics and additional filters on our guardrail to make sure we pass this test.

Using the TDD approach, you can improve your guardrail over time by improving the guardrail’s success in stopping bad actors from misusing the application, identifying edge cases or gaps you might not have previously considered, and adhering to responsible AI policies.

Optional: Automate the workflow and iteratively improve the guardrail

We recommend reviewing your test results after each iteration. This step doesn’t guarantee the guardrail will pass all tests. You should use this step to help understand how to modify your existing guardrail configuration.

When practicing the TDD approach, we recommend improving the guardrail over time through multiple iterations. This optional step allows you to prompt the user for details, which are then used to build a guardrail and test cases from scratch. Then, you allow the user to input n number of iterations, where in each iteration you rerun all the tests and adjust the guardrail’s denied topics based on the test results.

To create the guardrail, prompt the user for the guardrail name and description. With the given description, you use the InvokeModel API with the guardrail_prompt.txt system prompt to generate the denied topics of your guardrail. Using this configuration, you invoke the CreateGuardrail API to build the guardrail. You can validate that a new guardrail has been created by refreshing your Amazon Bedrock Guardrails dashboard. In the following screenshot, you can see that a new guardrail for a photography application has been created.

Using the same parameters, you can use the InvokeModel API to generate test cases for your newly created guardrail. The tests_prompt.txt file provides a system prompt that makes sure that the FM creates 30 test cases with 20 input tests and 10 output tests. To practice TDD, use these test cases and iteratively modify the existing guardrail n times as requested by the user based on the test results of each iteration.

The process of iteratively modifying the existing guardrail consists of four steps:

GetGuardrail API

current_guardrail_details = client.get_guardrail( guardrailIdentifier=guardrail_id,   guardrailVersion=version) current_denied_topics = current_guardrail_details[‘topicPolicy’][‘topics’]current_name = current_guardrail_details[‘name’]current_description = guardrail_descriptioncurrent_id = current_guardrail_details[‘guardrailId’]current_version = current_guardrail_details[‘version’]

CreateGuardrailVersion API

guardrail_ready_check

‘READY’

response = client.create_guardrail_version( guardrailIdentifier=current_id  description=”Iteration “+str(i)+” – “+current_description   clientRequestToken=f”GuardrailUpdate-{int(time.time())}-{uuid.uuid4().hex}”)guardrail_ready_check(guardrail_id,15,10)

The guardrail_ready_check function uses the GetGuardrail API to get the current status of your guardrail. If the guardrail is not in the ‘READY’ state, this function implements wait logic until it is, or results in a timeout error.

def guardrail_ready_check(guardrail_id, max_attempts, delay): #Poll for ready state   for attempt in range(max_attempts):     try:            guardrail_status = client.get_guardrail(guardrailIdentifier=guardrail_id)[‘status’]         if guardrail_status == ‘READY’:             print(f”Guardrail {guardrail_id} is now in READY state.”)               return response         elif guardrail_status == ‘FAILED’:              raise Exception(f”Guardrail {guardrail_id} update failed.”)         else:               print(f”Guardrail {guardrail_id} is in {guardrail_status} state. Waiting...”)               time.sleep(delay)       except Exception as e:          print(f”Error checking guardrail status: {str(e)}”)         time.sleep(delay)   raise TimeoutError(f”Guardrail {guardrail_id} did not reach READY state within the expected time.”)

auto_generated_tests.csv

process_tests

process_tests(input_file, output_file, current_id, current_version)test_results = pd.read_csv(output_file)

The input_file will be your auto_generated_tests.csv file. However, the output_file is dynamically named based on the iteration. For example, for iteration 3, it will name the results file test_results_3.csv.

InvokeModel API

get_denied_topics

guardrail_prompt.txt

updated_topics = get_denied_topics(guardrail_description, current_denied_topics, test_results)

UpdateGuardrail API

update_guardrail

update_guardrail(current_id, current_name, current_description, current_version, updated_topics)

After completing n iterations, you will have n versions of the guardrail created as well as n test results, as shown in the following screenshot. This allows you to review each iteration and update your guardrail’s configuration to help meet your application’s requirements. When using TDD, it’s important to validate your test results and verify that you’re making improvements over time for the best results.

Clean up

In this solution, you created a guardrail, built a dataset, evaluated the guardrail against the dataset, and iteratively modified the guardrail based on the test results. To clean up, use the DeleteGuardrail API, which deletes the guardrail using the guardrail ID and guardrail version.

Pricing

This solution uses Amazon Bedrock, which bills based on FM invocation and guardrail usage:

FM invocation

Guardrails

See Amazon Bedrock pricing for more details.

Conclusion

When developing generative AI applications, it’s crucial to implement robust safeguards and governance measures to maintain responsible AI use. Amazon Bedrock Guardrails provides a framework to achieve this. However, guardrails aren’t static entities—they require continuous refinement and adaptation to keep pace with evolving use cases, malicious threats, and responsible AI policies. TDD is a software development methodology that encourages improving software through iterative development cycles.

As shown in this post, you can adopt TDD when building safeguards for your generative AI applications. By systematically testing and refining guardrails, companies can not only reduce potential risks and operational inefficiencies, but also foster a culture of shared knowledge among technical teams, driving continuous improvement and strategic decision-making in AI development.

We recommend integrating the TDD approach in your software development practices to make sure that you’re improving your safeguards over time as new edge cases arise and your use cases evolve. Leave a comment on this post or open an issue on GitHub if you have any questions.

About the Author

Harsh Patel is an AWS Solutions Architect supporting 200+ SMB customers across the United States to drive digital transformation through cloud-native solutions. As an AI&ML Specialist, he focuses on Generative AI, Computer Vision, Reinforcement Learning and Anomaly Detection. Outside the tech world, he recharges by hitting the golf course and embarking on scenic hikes with his dog.

Aditi Rajnish is a Second-year software engineering student at University of Waterloo. Her interests include computer vision, natural language processing, and edge computing. She is also passionate about community-based STEM outreach and advocacy. In her spare time, she can be found rock climbing, playing the piano, or learning how to bake the perfect scone.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Iterative development

Solution overview

Prerequisites

Clone the repo

Build your guardrail

Build the testing dataset

Evaluate the guardrail with the testing dataset

Optional: Automate the workflow and iteratively improve the guardrail

Clean up

Pricing

Conclusion

About the Author

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签