Integrate Automated Reasoning checks in your application - Amazon Bedrock
Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.

Integrate Automated Reasoning checks in your application

After you deploy your Automated Reasoning policy in a guardrail (see Deploy your Automated Reasoning policy in your application), you can use it at runtime to validate LLM responses and act on the feedback. This page explains how to call the validation API, interpret the findings programmatically, and implement common integration patterns such as rewriting invalid responses and asking clarifying questions.

Automated Reasoning checks operate in detect mode only — they return findings and feedback rather than blocking content. Your application is responsible for deciding what to do with the findings: serve the response, rewrite it, ask for clarification, or fall back to a default behavior.

Integration overview

At runtime, the integration follows this flow:

User question ──► LLM generates response ──► ApplyGuardrail validates response │ ┌─────────┴─────────┐ │ │ VALID Not VALID │ │ ▼ ▼ Serve response Inspect findings to user │ ┌────────┴────────┐ │ │ OTHER FINDING TRANSLATION_ TYPES AMBIGUOUS / SATISFIABLE │ │ ▼ ▼ Rewrite using Ask user for AR feedback clarification │ │ ▼ ▼ Validate again Validate with clarified input

Automated Reasoning findings are returned through any API that supports a Amazon Bedrock Guardrails configuration:

  • ApplyGuardrail — Standalone validation API. Use this when you want to validate content independently of the LLM invocation. This is the recommended approach for Automated Reasoning checks because it gives you full control over what content is validated and when.

  • Converse and InvokeModel — LLM invocation APIs with guardrail configuration. Automated Reasoning findings are returned in the trace field of the response.

  • InvokeAgent and RetrieveAndGenerate — Agent and knowledge base APIs with guardrail configuration.

This page focuses on the ApplyGuardrail API because it provides the most flexibility for implementing the rewriting and clarification patterns described below. For information about using guardrails with the other APIs, see Use a guardrail.

Open-source rewriting chatbot sample

For a complete, production-style implementation of the patterns described on this page, see the Automated Reasoning checks rewriting chatbot on GitHub. This sample application demonstrates:

  • An iterative rewriting loop where invalid responses are automatically corrected based on AR feedback.

  • Follow-up questions when the LLM needs additional context from the user to rewrite accurately.

  • A timeout mechanism that automatically resumes processing when users don't respond to clarification questions.

  • Policy context injection into LLM prompts so the LLM can reference the full policy rules during rewriting.

  • JSON audit logging of every validation iteration for compliance and debugging.

The sample uses a Python/Flask backend with a React frontend and communicates with Amazon Bedrock for LLM inference and Amazon Bedrock Guardrails for validation through the ApplyGuardrail API.

Note

The sample application includes the policy content directly in the LLM generation prompts to support any Automated Reasoning policy without requiring document uploads. In a production deployment, you would typically use RAG content or feed the LLM the original natural language document instead of the Automated Reasoning policy source code.

Call ApplyGuardrail with Automated Reasoning checks

Use the ApplyGuardrail API to validate content against your guardrail. The API accepts one or more content blocks and returns an assessment that includes Automated Reasoning findings.

Request structure

guardrailIdentifier (required)

The guardrail ID or ARN. Use the guardrail that has your Automated Reasoning policy attached.

guardrailVersion (required)

The guardrail version number (for example, 1). Use a numbered version for production workloads, not DRAFT.

source (required)

Set to OUTPUT when validating LLM responses. Set to INPUT when validating user prompts. For Automated Reasoning checks, you typically validate the LLM output.

content (required)

An array of content blocks to validate. Each block contains a text field with the content to check. You can pass the user question and the LLM response as separate content blocks, or combine them into a single block.

Example: Validate an LLM response using the AWS CLI

aws bedrock-runtime apply-guardrail \ --guardrail-identifier "your-guardrail-id" \ --guardrail-version "1" \ --source OUTPUT \ --content '[ { "text": { "text": "User: Am I eligible for parental leave if I have been working here for 2 years full-time?\nAssistant: Yes, you are eligible for parental leave." } } ]'

Example: Validate an LLM response using Python (boto3)

import boto3 import json bedrock_runtime = boto3.client("bedrock-runtime", region_name="eusc-de-east-1") response = bedrock_runtime.apply_guardrail( guardrailIdentifier="your-guardrail-id", guardrailVersion="1", source="OUTPUT", content=[ { "text": { "text": ( "User: Am I eligible for parental leave if I have been " "working here for 2 years full-time?\n" "Assistant: Yes, you are eligible for parental leave." ) } } ], ) # The AR findings are in the assessments for assessment in response.get("assessments", []): ar_assessment = assessment.get("automatedReasoningPolicy", {}) findings = ar_assessment.get("findings", []) for finding in findings: # Each finding is a union — exactly one key is present # Possible keys: valid, invalid, satisfiable, impossible, # translationAmbiguous, tooComplex, noTranslations print(json.dumps(finding, indent=2, default=str))

Response structure

The ApplyGuardrail response includes an assessments array. Each assessment contains an automatedReasoningPolicy object with a findings array. Each finding is a union type — exactly one of the following keys is present:

  • valid

  • invalid

  • satisfiable

  • impossible

  • translationAmbiguous

  • tooComplex

  • noTranslations

For a detailed description of each finding type and its fields, see Findings and validation results.

Interpret AR findings at runtime

To act on Automated Reasoning findings programmatically, your application needs to extract the finding type, the translation details, and the supporting or contradicting rules. The following sections explain how to parse each part of a finding.

Determine the finding type

Each finding is a union — exactly one key is present. Check which key exists to determine the finding type:

def get_finding_type(finding): """Return the finding type and its data from an AR finding union.""" for finding_type in [ "valid", "invalid", "satisfiable", "impossible", "translationAmbiguous", "tooComplex", "noTranslations" ]: if finding_type in finding: return finding_type, finding[finding_type] return None, None

Read the translation

Most finding types include a translation object that shows how Automated Reasoning checks translated the natural language input into formal logic. The translation contains:

  • premises — The conditions extracted from the input (for example, isFullTime = true, tenureMonths = 24).

  • claims — The assertions to validate (for example, eligibleForParentalLeave = true).

  • untranslatedPremises — Parts of the input that could not be mapped to policy variables. These parts are not validated.

  • untranslatedClaims — Claims that could not be mapped to policy variables.

Check untranslatedPremises and untranslatedClaims to understand the scope of the validation. A VALID result only covers the translated claims — untranslated content is not verified.

Read the supporting or contradicting rules

Depending on the finding type, the finding includes rules that explain the result:

  • valid findings include supportingRules — the policy rules that prove the claims are correct.

  • invalid findings include contradictingRules — the policy rules that the claims violate.

  • satisfiable findings include both a claimsTrueScenario and a claimsFalseScenario — showing the conditions under which the claims are true and false.

These rules and scenarios are the key inputs for the rewriting pattern described in Rewrite invalid responses using AR feedback.

Determine the aggregate result

A single validation request can return multiple findings. To determine the overall result, sort findings by severity and select the worst. The severity order from worst to best is: TRANSLATION_AMBIGUOUS, INVALID, IMPOSSIBLE, SATISFIABLE, NO_TRANSLATIONS, VALID, TOO_COMPLEX.

SEVERITY_ORDER = { "translationAmbiguous": 0, "invalid": 1, "impossible": 2, "satisfiable": 3, "noTranslations": 4, "valid": 5, "tooComplex": 6, } def get_aggregate_result(findings): """Return the worst finding type from a list of findings.""" worst = None worst_severity = float("inf") for finding in findings: finding_type, _ = get_finding_type(finding) severity = SEVERITY_ORDER.get(finding_type, 0) if severity < worst_severity: worst_severity = severity worst = finding_type return worst

Handle validation outcomes in your application

Use the aggregate result to decide what your application does next. The following table summarizes the recommended action for each result type.

Result What it means Recommended action
valid The response is mathematically proven correct given the premises and your policy rules. Serve the response to the user. Log the finding for audit purposes (see Build an audit trail).
invalid The response contradicts your policy rules. The contradictingRules field identifies which rules were violated. Rewrite the response using the AR feedback (see Rewrite invalid responses using AR feedback). If rewriting fails after multiple attempts, block the response and return a fallback message.
satisfiable The response is correct under some conditions but not all. It's not wrong, but it's incomplete — it doesn't mention all the requirements. Rewrite the response to include the missing conditions. Use the claimsFalseScenario to identify what's missing. Alternatively, you can let your LLM ask the user clarifying questions.
impossible The premises are contradictory, or the policy contains conflicting rules. Ask the user to clarify their input (see Ask clarifying questions). If the issue persists, it may indicate a policy problem — review the quality report.
translationAmbiguous The input has multiple valid interpretations. The translation models disagreed on how to map the natural language to policy variables. Ask the user for clarification to resolve the ambiguity. Use the options and differenceScenarios fields to generate targeted clarifying questions.
tooComplex The input exceeds processing limits for logical analysis. Simplify the input by breaking it into smaller parts, or return a fallback message explaining that the response could not be verified.
noTranslations The input is not relevant to your policy's domain. No policy variables could be mapped. The content is off-topic for this policy. Serve the response without AR validation, or use other guardrail components (such as topic policies) to handle off-topic content.

Rewrite invalid responses using AR feedback

The most powerful integration pattern for Automated Reasoning checks is the rewriting loop: when a response is invalid or satisfiable, your application constructs a prompt that includes the original response, the specific findings, and the policy rules, then asks the LLM to rewrite the response to be consistent with the policy. The rewritten response is validated again, and the loop continues until the response is valid or a maximum number of iterations is reached.

Rewriting loop flow

LLM generates initial response │ ▼ Validate with ApplyGuardrail ◄──────────────────┐ │ │ ▼ │ ┌─────┴─────┐ │ │ │ │ VALID Not VALID │ │ │ │ ▼ ▼ │ Done Construct rewriting prompt │ with findings + rules │ │ │ ▼ │ LLM rewrites response │ │ │ ▼ │ Max iterations? ──── No ────────────────┘ │ Yes │ ▼ Return best response with warning

Construct the rewriting prompt

The rewriting prompt should include three pieces of information from the AR findings:

  1. The original response that failed validation.

  2. The specific finding — including the translated premises, claims, and the contradicting or supporting rules.

  3. An instruction to rewrite the response so that it is consistent with the policy rules.

Example rewriting prompt template:

The following response was checked against our policy and found to be {finding_type}. Original response: {original_response} The validation found the following issue: - Premises (what was understood from the input): {premises} - Claims (what was asserted): {claims} - Contradicting rules: {contradicting_rules} Please rewrite the response so that it is consistent with the policy document. Keep the same helpful tone and answer the user's question accurately based on the rules. If you cannot provide an accurate answer without more information, explain what additional information is needed.
Tip

Always include the Retrieval Augmented Generation (RAG) content in your rewriting requests or the policy rules so the LLM has all the context it needs when rewriting. The rewriting prompt template provides the specific finding details, while the system prompt provides the broader policy context. This dual-context approach is demonstrated in the open-source rewriting chatbot sample.

Rewriting best practices

  • Set a maximum iteration count. The rewriting loop should have a hard limit (typically 2–5 iterations) to prevent infinite loops. If the response is still not valid after the maximum iterations, return the best response with a warning or fall back to a default message.

  • Process findings in priority order. When multiple findings are returned, address the most severe finding first. The severity order is: translationAmbiguous, invalid, impossible, satisfiable, valid.

  • Include policy context in the system prompt. The LLM needs access either to the source document or the full policy rules to rewrite accurately. You can use a Knowledge Base to include your documents in the generation request or use the ExportAutomatedReasoningPolicyVersion API to retrieve the policy definition and format it for the LLM.

  • Log each iteration. Record the original response, the findings, the rewriting prompt, and the rewritten response for each iteration. This audit trail is valuable for debugging and compliance (see Build an audit trail).

Ask clarifying questions

When Automated Reasoning checks return translationAmbiguous, satisfiable, or impossible results, the LLM may not have enough information to rewrite the response accurately. In these cases, your application can ask the user for clarification, then incorporate the answers into the next validation attempt.

When to ask for clarification

  • translationAmbiguous — The input has multiple valid interpretations. The options field shows the competing interpretations, and the differenceScenarios field shows how they differ in practice. Use these to generate targeted questions about the specific ambiguity.

  • satisfiable — The response is correct under some conditions but not all. The claimsFalseScenario shows the conditions under which the response would be incorrect. Ask the user about those specific conditions.

  • impossible — The input contains contradictory statements. Ask the user to clarify the contradiction.

  • Rewriting fails — If the LLM cannot rewrite the response to be valid after multiple attempts, it may need additional context from the user. Ask the LLM to generate clarifying questions based on the findings.

Clarification pattern

The clarification flow works as follows:

  1. Extract the ambiguous variables or missing conditions from the AR findings.

  2. Generate clarifying questions — either programmatically from the finding fields, or by asking the LLM to formulate questions based on the findings.

  3. Present the questions to the user and collect answers.

  4. Incorporate the answers into the context and generate a new response.

  5. Validate the new response with ApplyGuardrail.

Example: Generate clarifying questions from a satisfiable finding

def generate_clarifying_questions(finding_data, user_question): """Ask the LLM to generate clarifying questions from a SATISFIABLE finding.""" claims_true = json.dumps( finding_data.get("claimsTrueScenario", {}), indent=2, default=str ) claims_false = json.dumps( finding_data.get("claimsFalseScenario", {}), indent=2, default=str ) prompt = ( f"A user asked: {user_question}\n\n" f"The answer is correct when these conditions hold:\n{claims_true}\n\n" f"But incorrect when these conditions hold:\n{claims_false}\n\n" f"Generate 1-3 short, specific questions to ask the user to determine " f"which conditions apply to their situation. Format each question on " f"its own line." ) return generate_response(prompt, "You are a helpful assistant.")

Build an audit trail

Automated Reasoning findings provide mathematically verifiable proof of validity. For regulated industries and compliance scenarios, this proof is a key differentiator — you can demonstrate that an AI response was verified against specific policy rules with specific variable assignments, not just pattern-matched or probabilistically assessed.

To build an effective audit trail, log the following information for each validation request:

  • Timestamp and request ID. When the validation occurred and a unique identifier for the request.

  • Input content. The user question and LLM response that were validated.

  • Finding type and details. The validation result (valid, invalid, etc.), the translated premises and claims, and the supporting or contradicting rules.

  • Action taken. What your application did with the finding — served the response, rewrote it, asked for clarification, or blocked it.

  • Rewriting history. If the response was rewritten, log each iteration: the original response, the rewriting prompt, the rewritten response, and the validation result for each iteration.

  • Policy version. The guardrail version and policy version used for validation. This ensures you can reproduce the validation result later.

Example: Audit log entry structure

{ "timestamp": "2025-07-21T14:30:00Z", "request_id": "req-abc123", "guardrail_id": "your-guardrail-id", "guardrail_version": "1", "user_question": "Am I eligible for parental leave?", "llm_response": "Yes, you are eligible for parental leave.", "validation_result": "valid", "findings": [ { "type": "valid", "premises": "isFullTime = true, tenureMonths = 24", "claims": "eligibleForParentalLeave = true", "supporting_rules": ["A1B2C3D4E5F6"] } ], "action_taken": "served_response", "rewrite_iterations": 0 }
Tip

Store audit logs in a durable, tamper-evident store such as Amazon CloudWatch Logs or Amazon S3 with object lock enabled. For compliance scenarios, consider using Lake to query audit logs across your organization.