Minimize generative AI hallucinations with Amazon Bedrock Automated Reasoning checks

Foundation models (FMs) and generative AI are transforming enterprise operations across industries. McKinsey & Company’s recent research estimates generative AI could contribute up to $4.4 trillion annually to the global economy through enhanced operational efficiency, productivity growth of 0.1% to 0.6% annually, improved customer experience through personalized interactions, and accelerated digital transformation.

Today, organizations struggle with AI hallucination when moving generative AI applications from experimental to production environments. Model hallucination, where AI systems generate plausible but incorrect information, remains a primary concern. The 2024 Gartner CIO Generative AI Survey highlights three major risks: reasoning errors from hallucinations (59% of respondents), misinformation from bad actors (48%), and privacy concerns (44%).

To improve factual accuracy of large language model (LLM) responses, AWS announced Amazon Bedrock Automated Reasoning checks (in gated preview) at AWS re:Invent 2024. Through logic-based algorithms and mathematical validation, Automated Reasoning checks validate LLM outputs against domain knowledge encoded in the Automated Reasoning policy to help prevent factual inaccuracies. Automated reasoning checks is part of Amazon Bedrock Guardrails, a comprehensive framework that also provides content filtering, personally identifiable information (PII) redaction, and enhanced security measures. Together, these capabilities enable organizations to implement reliable generative AI safeguards—with Automated Reasoning checks addressing factual accuracy while other Amazon Bedrock Guardrails features help protect against harmful content and safeguard sensitive information.

In this post, we discuss how to help prevent generative AI hallucinations using Amazon Bedrock Automated Reasoning checks.

Automated Reasoning overview

Automated Reasoning is a specialized branch of computer science that uses mathematical proof techniques and formal logical deduction to verify compliance with rules and requirements with absolute certainty under given assumptions. As organizations face increasing needs to verify complex rules and requirements with mathematical certainty, automated reasoning techniques offer powerful capabilities. For example, AWS customers have direct access to automated reasoning-based features such as IAM Access Analyzer, S3 Block Public Access, or VPC Reachability Analyzer.

Unlike probabilistic approaches prevalent in machine learning, Automated Reasoning relies on formal mathematical logic to provide definitive guarantees about what can and can’t be proven. This approach mirrors the rigors of auditors verifying financial statements or compliance officers validating regulatory requirements, but with mathematical precision. By using rigorous logical frameworks and theorem-proving methodologies, Automated Reasoning can conclusively determine whether statements are true or false under given assumptions. This makes it exceptionally valuable for applications that demand high assurance and need to deliver unambiguous conclusions to their users.

The following workflow illustrates solver-based formal verification, showing both the process flow and algorithm for verifying formal system properties through logical analysis and SAT/SMT solvers.

One of the widely used Automated Reasoning techniques is SAT/SMT solving, which involves encoding a representation of rules and requirements into logical formulas. A logical formula is a mathematical expression that uses variables and logical operators to represent conditions and relationships. After the rules and requirements are encoded into these formulas, specialized tools known as solvers are applied to compute solutions that satisfy these constraints. These solvers determine whether the formulas can be satisfied—whether there exist values for variables that make the formulas true.

This process starts with two main inputs: a formal representation of the system (like code or a policy) expressed as logical formulas, and a property to analyze (such as whether certain conditions are possible or requirements can be met). The solver can return one of three possible outcomes:

Satisfiable

Unsatisfiable

Unknown

This technique makes sure that you either get confirmation that the specific property holds (with a concrete example), proof that it can’t be satisfied (with information on conflicting constraints), or an indication that the problem needs to be reformulated or analyzed differently.

Key features of Automated Reasoning checks

Automated Reasoning checks offer the following key features:

Mathematical validation framework

Policy-based knowledge representation

Domain expert enablement

Natural language to logic translation

Explainable validation results

Interactive testing environment

Seamless AWS integration

These features combine to create a powerful framework that helps organizations maintain factual accuracy in their AI applications while providing transparent and mathematically sound validation processes.

Solution overview

Now that we understand the key features of Automated Reasoning checks, let’s examine how this capability works within Amazon Bedrock Guardrails. The following section provides a comprehensive overview of the architecture and demonstrates how different components work together to promote factual accuracy and help prevent hallucinations in generative AI applications.

Automated Reasoning checks in Amazon Bedrock Guardrails provides an end-to-end solution for validating AI model outputs using mathematically sound principles. This automated process uses formal logic and mathematical proofs to verify responses against established policies, offering definitive validation results that can significantly improve the reliability of your AI applications.

The following solution architecture follows a systematic workflow that enables rigorous validation of model outputs.

The workflow consists of the following steps:

Source documents (such as HR guidelines or operational procedures) are uploaded to the system. These documents, along with optional intent descriptions, are processed to create structured rules and variables that form the foundation of an Automated Reasoning policy. Subject matter experts review and inspect the created policy to verify accurate representation of business rules. Each validated policy is versioned and assigned a unique ARN for tracking and governance purposes. The validated Automated Reasoning policy is associated with Amazon Bedrock Guardrails, where specific policy versions can be selected for implementation. This integration enables automated validation of generative AI outputs. When the generative AI application produces a response, Amazon Bedrock Guardrails triggers the Automated Reasoning checks. The system creates logical representations of both the input question and the application’s response, evaluating them against the established policy rules. The Automated Reasoning check provides detailed validation results, including whether statements are Valid, Invalid, or No Data. For each finding, it explains which rules and variables were considered, and provides suggestions for making invalid statements valid.

With this solution architecture in place, organizations can confidently deploy generative AI applications knowing that responses will be automatically validated against your established policies using mathematically sound principles.

Prerequisites

To use Automated Reasoning checks in Amazon Bedrock, make sure you have met the following prerequisites:

AWS account

Input dataset

For this post, we examine a sample Paid Leave of Absence (LoAP) policy document as our example dataset. This policy document contains detailed guidelines covering employee eligibility criteria, duration limits, application procedures, and benefits coverage for paid leave. It’s an ideal example to demonstrate how Automated Reasoning checks can validate AI-generated responses against structured business policies, because it contains clear rules and conditions that can be converted into logical statements. The document’s mix of quantitative requirements (such as minimum tenure and leave duration) and qualitative conditions (like performance status and approval processes) makes it particularly suitable for showcasing the capabilities of automated reasoning validation.

The following screenshot shows an example of our policy document.

Start an Automated Reasoning check using the Amazon Bedrock console

The first step is to encode your knowledge—in this case, the sample LoAP policy—into an Automated Reasoning policy. Complete the following steps to initiate an Automated Reasoning check using the Amazon Bedrock console:

Automated Reasoning Preview

Safeguards

Create policy

Provide a policy name and policy description.

Upload your source document. The source content can’t be modified after creation and must not exceed 6,000 characters with limitations on table sizes and image processing. Include a description of the intent of the Automated Reasoning policy you’re creating. For the sample policy, you can use the following intent:

Create a logical model of the Leave of Absence, Paid (LoAP) policy in this document.Employees will ask questions about what are the eligibility requirements for the program,whether they are allowed to take LOAP and for how long, duration and benefits during thetime off, and return to work. Below is an example question:QUESTION: I am a temporary contractor working in operations. Am I eligible for LOAP?ANSWER: No, only full-time employees are eligible for LoAP.

The policy creation process takes a few minutes to complete. The rules and variables are created after creating the policy and they can be edited, removed, or have new rules or variables added to them.

The policy document version is outlined in the details section along with the intent description and build status.

Next, you create a guardrail in Amazon Bedrock by configuring as many filters as you need.

Guardrails

Safeguards

Create guardrail

Provide guardrail details such as a name and an optional description.

Enable Automated Reasoning policy

Next

Submit

After submitting, you’ll be presented with one or more findings. A finding contains a set of facts that were extracted from the input Q&A and are analyzed independently. Each finding includes four key components:

Validation results

Applied rules

Extracted variables

Suggestions

Finally, you can use the feedback suggestions to improve your LLM’s responses.

is_full_time, "works more than 20 hours per week"

"works more than 20 hours per week; set to true if user says 'full-time' and false if user says 'part-time'"

Start an Automated Reasoning check using Python SDK and APIs

First, you need to create an Automated Reasoning policy from your documents using the Amazon Bedrock console as outlined in the previous section. Next, you can use the policy created with the ApplyGuardrail API to validate your generative AI application.

To use the Python SDK for validation using Automated Reasoning checks, follow these steps:

First, set up the required configurations:

import boto3import botocoreimport osimport json# Configuration parametersDEFAULT_GUARDRAIL_NAME = "<YOUR_GUARDRAIL_NAME>"  # e.g., "my_policy_guardrail"DEFAULT_AR_POLICY_VERSION = "1"# AWS configurationregion = "us-west-2"ar_policy = "<YOUR_AR_POLICY_ID>"  # e.g., "ABC123DEF456"model_id = "<YOUR_MODEL_ID>"  # e.g., "anthropic.claude-3-haiku-20240307-v1:0"

Before using Amazon Bedrock with Automated Reasoning policies, you will need to load the required service models. After being allowlisted for Amazon Bedrock access, you will receive two model files along with their corresponding version information. The following is a Python script to help you load these service models:

def add_service_model(model_file, service_name, version):    """    Adds a service model to the AWS configuration directory.        Args:        model_file (str): Path to the model file        service_name (str): Name of the AWS service        version (str): Service model version    """    # Configure paths    source = f"models/{model_file}"  # Your downloaded model files directory    dest_dir = os.path.expanduser(f"~/.aws/models/{service_name}/{version}")    dest_file = f"{dest_dir}/service-2.json"    try:        # Create directory and copy model file        os.makedirs(dest_dir, exist_ok=True)        with open(source) as f:            model = json.load(f)        with open(dest_file, 'w') as f:            json.dump(model, f, indent=2)        print(f"Successfully added model for {service_name}")        return True    except Exception as e:        print(f"Error adding {service_name} model: {e}")        return Falsedef main():    # Define your model files and versions    # Replace with your actual model information provided by AWS    models = {        '<bedrock-model-file>.json': ('bedrock', '<bedrock-version>'),        '<runtime-model-file>.json': ('bedrock-runtime', '<runtime-version>')    }        # Load each model    for model_file, (service_name, version) in models.items():        add_service_model(model_file, service_name, version)if __name__ == "__main__":    main()

After you set up the service models, initialize the AWS clients for both Amazon Bedrock and Amazon Bedrock Runtime services. These clients will be used to interact with the models and apply guardrails.

# Initialize AWS clientsboto_session = boto3.Session(region_name=region)runtime_client = boto_session.client("bedrock-runtime")bedrock_client = boto_session.client("bedrock")

Before applying Automated Reasoning policies, you need to either locate an existing guardrail or create a new one. The following code first attempts to find a guardrail by name, and if not found, creates a new guardrail with the specified Automated Reasoning policy configuration. This makes sure you have a valid guardrail to work with before proceeding with policy enforcement.

def find_guardrail_id(client, name) -> tuple[str, str]:    """    Finds the ID and version of a guardrail by its name.        Args:        client: The Bedrock client object        name (str): Name of the guardrail to find        Returns:        tuple[str, str]: Guardrail ID and version if found, None otherwise    """    next_token = None    while True:        # List existing guardrails        resp = client.list_guardrails(        ) if next_token is None else client.list_guardrail(nextToken=next_token)        # Search for matching guardrail        for g in resp["guardrails"]:            if g["name"] == name:                return g["id"], g["version"]        # Handle pagination        if "nextToken" in resp and resp["nextToken"] != "":            next_token = resp["nextToken"]        else:            break    return None, None# Find or create guardrail with AR policytry:    # First, try to find existing guardrail    guardrail_id, guardrail_version = find_guardrail_id(        bedrock_client, DEFAULT_GUARDRAIL_NAME)        # If not found, create new guardrail    if guardrail_id is None:        create_resp = bedrock_client.create_guardrail(            name=DEFAULT_GUARDRAIL_NAME,            description="Automated Reasoning checks demo guardrail",            automatedReasoningPolicyConfig={                "policyIdentifier": ar_policy,                "policyVersion": DEFAULT_AR_POLICY_VERSION            },            blockedInputMessaging='Input is blocked',            blockedOutputsMessaging='Output is blocked',        )        guardrail_id = create_resp["guardrailId"]        guardrail_version = create_resp["version"]        print(f"✓ Created new guardrail: {guardrail_id}")    else:        print(f"✓ Found existing guardrail: {guardrail_id}")        except botocore.exceptions.ClientError as e:    print(f"✗ Error managing guardrail: {str(e)}")    raise

When testing guardrails with Automated Reasoning policies, you need to properly format your input data. The following code shows how to structure a sample question and answer pair for validation:

def create_sample_input():    """    Creates a formatted sample input for guardrail validation.        The format requires both the query and response to be properly structured    with appropriate qualifiers.        Returns:        list: Formatted input for guardrail validation    """    sample_query = "I am a part-time employee, am I eligible for LoAP?"    sample_response = "Yes, part time employees are allowed to use LoAP"        return [        {            "text": {                "text": sample_query,                "qualifiers": ["query"]            }        },        {            "text": {                "text": sample_response,                "qualifiers": ["guard_content"]            }        }    ]# Example usageguardrail_input = create_sample_input()print(json.dumps(guardrail_input, indent=2))

Now that you have your formatted input data, you can apply the guardrail with Automated Reasoning policies to validate the content. The following code sends the input to Amazon Bedrock Guardrails and returns the validation results:

guardrails_output = runtime_client.apply_guardrail(            guardrailIdentifier= guardrail_id,            guardrailVersion= guardrail_version,            source="OUTPUT",            content=guardrail_input,        )

After applying guardrails, you need to extract and analyze the Automated Reasoning assessment results. The following code shows how to process the guardrail output:

# Extract Automated Reasoning assessmentar_assessment = Nonefor assessment in guardrails_output["assessments"]:    if "automatedReasoningPolicy" in assessment:        ar_assessment = assessment["automatedReasoningPolicy"]["findings"]        breakif ar_assessment is None:    print("No Automated Reasoning assessment found")else:    print("Automated Reasoning Assessment Results:")    print(json.dumps(ar_assessment, indent=2))    # Process any policy violations    for finding in ar_assessment:        if finding["result"] == "INVALID":            print("\nPolicy Violations Found:")            # Print violated rules            for rule in finding.get("rules", []):                print(f"Rule: {rule['description']}")                        # Print suggestions if any            if "suggestions" in finding:                print("\nSuggested Corrections:")                for suggestion in finding["suggestions"]:                    print(f"- {suggestion}")

The output will look something like the following:

{    "result": "INVALID",    "assignments": [...],    "suggestions": [...],    "rules": [        {            "identifier": "<IDENTIFIER>",            "description": "An employee is eligible for LoAP if and only if..."        }    ]}

When a response violates AR policies, the system identifies which rules were violated and provides information about the conflicts. The feedback from the AR policy validation can be routed back to improve the model’s output, promoting compliance while maintaining response quality.

Possible use cases

Automated Reasoning checks can be applied across various industries to promote accuracy, compliance, and reliability in AI-generated responses while maintaining industry-specific standards and regulations. Although we have tested these checks across multiple applications, we continue to explore additional potential use cases. The following table provides some applications across different sectors.

Industry	Use Cases
Healthcare	Validate AI-generated treatment recommendations against clinical care protocols and guidelines Verify medication dosage calculations and check for potential drug interactions Make sure patient education materials align with medical best practices Validate clinical documentation for regulatory compliance
Financial Services	Verify investment recommendations against regulatory requirements and risk policies Validate customer communications for compliance with financial regulations Verify that credit decision explanations meet fairness and transparency guidelines Check transaction processing against anti-fraud and anti-money laundering policies
Travel and Hospitality	Validate booking and ticketing policies for accuracy Verify loyalty program benefit calculations follow established rules Verify travel documentation requirements and restrictions Validate pricing and refund calculations
Insurance	Verify claim processing decisions against policy terms Validate coverage explanations for accuracy and completeness Make sure that risk assessment recommendations follow underwriting guidelines Check policy documentation for regulatory compliance
Energy and Utilities	Validate maintenance scheduling against equipment specifications Verify emergency response protocols for different scenarios Make sure that field operation instructions follow safety guidelines Check grid management decisions against operational parameters
Manufacturing	Validate quality control procedures against industry standards Verify production scheduling against capacity and resource constraints Make sure that safety protocols are followed in operational instructions Check inventory management decisions against supply chain policies

Best practices for implementation

Successfully implementing Automated Reasoning checks requires careful attention to detail and a systematic approach to achieve optimal validation accuracy and reliable results. The following are some key best practices:

Document preparation

Intent description engineering

Create a logical model for [USE CASE] with policy rules. Users will ask questions about [SPECIFIC TOPICS].Example Q&A: [INCLUDE SAMPLE].

Policy validation

Comprehensive testing

Iterative improvement

Version control management

Error handling strategy

Runtime optimization

Feedback integration

Conclusion

Amazon Bedrock Automated Reasoning checks represent a significant advancement in formally verifying the outputs of generative AI applications. By combining rigorous mathematical validation with a user-friendly interface, this feature addresses one of the most critical challenges in AI deployment: maintaining factual consistency and minimizing hallucinations. The solution’s ability to validate AI-generated responses against established policies using formal logic provides organizations with a powerful framework for building trustworthy AI applications that can be confidently deployed in production environments.

The versatility of Automated Reasoning checks, demonstrated through various industry use cases and implementation approaches, makes it a valuable tool for organizations across sectors. Whether implemented through the Amazon Bedrock console or programmatically using APIs, the feature’s comprehensive validation capabilities, detailed feedback mechanisms, and integration with existing AWS services enable organizations to establish quality control processes that scale with their needs. The best practices outlined in this post provide a foundation for organizations to maximize the benefits of this technology while maintaining high standards of accuracy.

As enterprises continue to expand their use of generative AI, the importance of automated validation mechanisms becomes increasingly critical. We encourage organizations to explore Amazon Bedrock Automated Reasoning checks and use its capabilities to build more reliable and accurate AI applications. To help you get started, we’ve provided detailed implementation guidance, practical examples, and a Jupyter notebook with code snippets in our GitHub repository that demonstrate how to effectively integrate this feature into your generative AI development workflow. Through systematic validation and continuous refinement, organizations can make sure that their AI applications deliver consistent, accurate, and trustworthy results.

About the Authors

Adewale Akinfaderin is a Sr. Data Scientist–Generative AI, Amazon Bedrock, where he contributes to cutting edge innovations in foundational models and generative AI applications at AWS. His expertise is in reproducible and end-to-end AI/ML methods, practical implementations, and helping global customers formulate and develop scalable solutions to interdisciplinary problems. He has two graduate degrees in physics and a doctorate in engineering.

Nafi Diallo is a Sr. Applied Scientist in the Automated Reasoning Group and holds a PhD in Computer Science. She is passionate about using automated reasoning to ensure the security of computer systems, improve builder productivity, and enable the development of trustworthy and responsible AI workloads. She worked for more than 5 years in the AWS Application Security organization, helping build scalable API security testing solutions and shifting security assessment left.