<section class="blog-post-content lb-rtxt"><table id="amazon-polly-audio-table"><tbody><tr><td id="amazon-polly-audio-tab"><p></p></td></tr></tbody></table><p>Today, I’m happy to share that Automated Reasoning checks, a new <a href="https://aws.amazon.com/bedrock/guardrails/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">Amazon Bedrock Guardrails</a> policy <a href="https://aws.amazon.com/blogs/aws/prevent-factual-errors-from-llm-hallucinations-with-mathematically-sound-automated-reasoning-checks-preview/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">that we previewed during AWS re:Invent</a>, is now generally available. Automated Reasoning checks helps you validate the accuracy of content generated by <a href="https://aws.amazon.com/what-is/foundation-models/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">foundation models (FMs)</a> against a domain knowledge. This can help prevent factual errors due to AI hallucinations. The policy uses mathematical logic and formal verification techniques to validate accuracy, providing definitive rules and parameters against which AI responses are checked for accuracy.</p><p>This approach is fundamentally different from probabilistic reasoning methods which deal with uncertainty by assigning probabilities to outcomes. In fact, Automated Reasoning checks delivers up to 99% verification accuracy, providing provable assurance in detecting AI hallucinations while also assisting with ambiguity detection when the output of a model is open to more than one interpretation.</p><p>With general availability, you get the following new features:</p><ul><li>Support for large documents in a single build, up to 80K tokens – Process extensive documentation; we found this can add up to 100 pages of content</li><li>Simplified policy validation – Save your validation tests and run them repeatedly, making it easier to maintain and verify your policies over time</li><li>Automated scenario generation – Create test scenarios automatically from your definitions, saving time and effort while helping make coverage more comprehensive</li><li>Enhanced policy feedback – Provide natural language suggestions for policy changes, simplifying the way you can improve your policies</li><li>Customizable validation settings – Adjust confidence score thresholds to match your specific needs, giving you more control over validation strictness</li></ul><p>Let’s see how this works in practice.</p><p><strong>Creating Automated Reasoning checks in Amazon Bedrock Guardrails<br /></strong> To use Automated Reasoning checks, you first encode rules from your knowledge domain into an Automated Reasoning policy, then use the policy to validate generated content. For this scenario, I’m going to create a mortgage approval policy to safeguard an AI assistant evaluating who can qualify for a mortgage. It is important that the predictions of the AI system do not deviate from the rules and guidelines established for mortgage approval. These rules and guidelines are captured in a policy document written in natural language.</p><p>In the <a href="https://console.aws.amazon.com/bedrock/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">Amazon Bedrock console</a>, I choose <strong>Automated Reasoning</strong> from the navigation pane to create a policy.</p><p>I enter name and description of the policy and upload the PDF of the policy document. The name and description are just metadata and do not contribute in building the Automated Reasoning policy. I describe the source content to add context on how it should be translated into formal logic. For example, I explain how I plan to use the policy in my application, including sample Q&A from the AI assistant.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/22/ar-checks-create-policy-1.png"><img class="aligncenter size-full wp-image-98597" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/22/ar-checks-create-policy-1.png" alt="Consoel screenshot." width="1232" height="1199" /></a></p><p>When the policy is ready, I land on the overview page, showing the policy details and a summary of the tests and definitions. I choose <strong>Definitions</strong> from the dropdown to examine the Automated Reasoning policy, made of rules, variables, and types that have been created to translate the natural language policy into formal logic.</p><p>The <strong>Rules</strong> describe how variables in the policy are related and are used when evaluating the generated content. For example, in this case, which are the thresholds to apply and how some of the decisions are taken. For traceability, each rule has its own unique ID.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-rules.png"><img class="aligncenter size-full wp-image-98432" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-rules.png" alt="Console screenshot." width="1051" height="819" /></a></p><p>The <strong>Variables</strong> represent the main concepts at play in the original natural language documents. Each variable is involved in one or more rules. Variables allow complex structures to be easier to understand. For this scenario, some of the rules need to look at the down payment or at the credit score.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-variables.png"><img class="aligncenter size-full wp-image-98433" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-variables.png" alt="Console screenshot." width="1052" height="937" /></a></p><p>Custom <strong>Types</strong> are created for variables that are neither boolean nor numeric. For example, for variables that can only assume a limited number of values. In this case, there are two type of mortgage described in the policy, insured and conventional.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-types.png"><img class="aligncenter size-full wp-image-98434" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-types.png" alt="Console screenshot." width="1056" height="508" /></a></p><p>Now we can assess the quality of the initial Automated Reasoning policy through testing. I choose <strong>Tests</strong> from the dropdown. Here I can manually enter a test, consisting of input (optional) and output, such as a question and its possible answer from the interaction of a customer with the AI assistant. I then set the expected result from the Automated Reasoning check. The expected result can be valid (the answer is correct), invalid (the answer is not correct), or satisfiable (the answer could be true or false depending on specific assumptions). I can also assign a confidence threshold for the translation of the query/content pair from natural language to logic.</p><p>Before I enter tests manually, I use the option to automatically generate a scenario from the definitions. This is the easiest way to validate a policy and (unless you’re an expert in logic) should be the first step after the creation of the policy.</p><p>For each generated scenario, I provide an expected validation to say if it is something that can happen (satisfiable) or not (invalid). If not, I can add an annotation that can then be used to update the definitions. For a more advanced understanding of the generated scenario, I can show the formal logic representation of a test using <a href="https://smt-lib.org/">SMT-LIB</a> syntax.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-generate.png"><img class="aligncenter size-full wp-image-98435" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-generate.png" alt="Console screenshot." width="841" height="487" /></a></p><p>After using the generate scenario option, I enter a few tests manually. For these tests, I set different expected results: some are valid, because they follow the policy, some are invalid, because they flout the policy, and some are satisfiable, because their result depends on specific assumptions.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/22/ar-checks-add-tests-1.png"><img class="aligncenter size-full wp-image-98595" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/22/ar-checks-add-tests-1.png" alt="Console screenshot." width="1419" height="652" /></a></p><p>Then, I choose <strong>Validate all tests</strong> to see the results. All tests passed in this case. Now, when I update the policy, I can use these tests to validate that the changes didn’t introduce errors.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-summary-tests.png"><img class="aligncenter size-full wp-image-98437" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-summary-tests.png" alt="Console screenshot." width="1079" height="350" /></a></p><p>For each test, I can look at the findings. If a test doesn’t pass, I can look at the rules that created the contradiction that made the test fail and go against the expected result. Using this information, I can understand if I should add an annotation, to improve the policy, or correct the test.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-findings-test.png"><img class="aligncenter size-full wp-image-98438" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/16/ar-checks-findings-test.png" alt="Console screenshot." width="1075" height="459" /></a></p><p>Now that I’m satisfied with the tests, I can create a new Amazon Bedrock guardrail (or update an existing one) to use up to two Automated Reasoning policies to check the validity of the responses of the AI assistant. All six policies offered by Guardrails are modular, and can be used together or separately. For example, Automated Reasoning checks can be used with other safeguards such as content filtering and contextual grounding checks. The guardrail can be applied to models served by Amazon Bedrock or with any third-party model (such as OpenAI and Google Gemini) via the <a href="https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ApplyGuardrail.html?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">ApplyGuardrail</a> API. I can also <a href="https://strandsagents.com/latest/documentation/docs/user-guide/safety-security/guardrails/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">use the guardrail with an agent framework such as Strands Agents,</a> including <a href="https://aws.amazon.com/blogs/aws/introducing-amazon-bedrock-agentcore-securely-deploy-and-operate-ai-agents-at-any-scale/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">agents deployed using Amazon Bedrock AgentCore</a>.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/22/ar-checks-guardrails-1.png"><img class="aligncenter size-full wp-image-98596" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/22/ar-checks-guardrails-1.png" alt="Console screenshot." width="1235" height="861" /></a></p><p>Now that we saw how to set up a policy, let’s look at how Automated Reasoning checks are used in practice.</p><p><strong>Customer case study – Utility outage management systems<br /></strong> When the lights go out, every minute counts. That’s why utility companies are turning to AI solutions to improve their outage management systems. We collaborated on a solution in this space together with <a href="https://www.pwc.com/us/en/technology/alliances/amazon-web-services/ai-solutions.html">PwC</a>. Using Automated Reasoning checks, utilities can streamline operations through:</p><ul><li>Automated protocol generation – Creates standardized procedures that meet regulatory requirements</li><li>Real-time plan validation – Ensures response plans comply with established policies</li><li>Structured workflow creation – Develops severity-based workflows with defined response targets</li></ul><p>At its core, this solution combines intelligent policy management with optimized response protocols. Automated Reasoning checks are used to assess AI-generated responses. When a response is found to be invalid or satisfiable, the result of the Automated Reasoning check is used to rewrite or enhance the answer.</p><p>This approach demonstrates how AI can transform traditional utility operations, making them more efficient, reliable, and responsive to customer needs. By combining mathematical precision with practical requirements, this solution sets a new standard for outage management in the utility sector. The result is faster response times, improved accuracy, and better outcomes for both utilities and their customers.</p><p>In the words of Matt Wood, PwC’s Global and US Commercial Technology and Innovation Officer:</p><p><em>“At PwC, we’re helping clients move from AI pilot to production with confidence—especially in highly regulated industries where the cost of a misstep is measured in more than dollars. Our collaboration with AWS on Automated Reasoning checks is a breakthrough in responsible AI: mathematically assessed safeguards, now embedded directly into Amazon Bedrock Guardrails. We’re proud to be AWS’s launch collaborator, bringing this innovation to life across sectors like pharma, utilities, and cloud compliance—where trust isn’t a feature, it’s a requirement.”</em></p><p><strong>Things to know<br /></strong> Automated Reasoning checks in <a href="https://aws.amazon.com/bedrock/guardrails/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">Amazon Bedrock Guardrails</a> is generally available today in the following <a href="https://aws.amazon.com/about-aws/global-infrastructure/regions_az/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">AWS Regions</a>: US East (Ohio, N. Virginia), US West (Oregon), and Europe (Frankfurt, Ireland, Paris).</p><p>With Automated Reasoning checks, you pay based on the amount of text processed. For more information, see <a href="https://aws.amazon.com/bedrock/pricing/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">Amazon Bedrock pricing</a>.</p><p>To learn more, and build secure and safe AI applications, see the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-automated-reasoning-checks.html?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">technical documentation</a> and the <a href="https://github.com/aws-samples/amazon-bedrock-samples/tree/main/responsible_ai/bedrock-automated-reasoning-checks">GitHub code samples</a>. Follow <a href="https://console.aws.amazon.com/bedrock/home#/automated-reasoning/policies?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">this link for direct access to the Amazon Bedrock console</a>.</p><p>The videos in this playlist include an introduction to Automated Reasoning checks, a deep dive presentation, and hands-on tutorials to create, test, and refine a policy.</p><p>— <a href="https://x.com/danilop">Danilo</a></p></section><aside id="Comments" class="blog-comments"><div data-lb-comp="aws-blog:cosmic-comments" data-env="prod" data-content-id="7556202f-9b29-4b24-bcf4-a973c5a212c0" data-title="Minimize AI hallucinations and deliver up to 99% verification accuracy with Automated Reasoning checks: Now available" data-url="https://aws.amazon.com/blogs/aws/minimize-ai-hallucinations-and-deliver-up-to-99-verification-accuracy-with-automated-reasoning-checks-now-available/"><p data-failed-message="Comments cannot be loaded… Please refresh and try again.">Loading comments…</p></div></aside>