Integrate Automated Reasoning checks in your application
After you deploy your Automated Reasoning policy in a guardrail (see Deploy your Automated Reasoning policy in your application), you can use it at runtime to validate LLM responses and act on the feedback. This page explains how to call the validation API, interpret the findings programmatically, and implement common integration patterns such as rewriting invalid responses and asking clarifying questions.
Automated Reasoning checks operate in detect mode only — they return findings and feedback rather than blocking content. Your application is responsible for deciding what to do with the findings: serve the response, rewrite it, ask for clarification, or fall back to a default behavior.
Integration overview
At runtime, the integration follows this flow:
User question ──► LLM generates response ──► ApplyGuardrail validates response │ ┌─────────┴─────────┐ │ │ VALID Not VALID │ │ ▼ ▼ Serve response Inspect findings to user │ ┌────────┴────────┐ │ │ OTHER FINDING TRANSLATION_ TYPES AMBIGUOUS / SATISFIABLE │ │ ▼ ▼ Rewrite using Ask user for AR feedback clarification │ │ ▼ ▼ Validate again Validate with clarified input
Automated Reasoning findings are returned through any API that supports a Amazon Bedrock Guardrails configuration:
-
ApplyGuardrail— Standalone validation API. Use this when you want to validate content independently of the LLM invocation. This is the recommended approach for Automated Reasoning checks because it gives you full control over what content is validated and when. -
ConverseandInvokeModel— LLM invocation APIs with guardrail configuration. Automated Reasoning findings are returned in thetracefield of the response. -
InvokeAgentandRetrieveAndGenerate— Agent and knowledge base APIs with guardrail configuration.
This page focuses on the ApplyGuardrail API because it provides the
most flexibility for implementing the rewriting and clarification patterns described
below. For information about using guardrails with the other APIs, see Use a guardrail.
Open-source rewriting chatbot sample
For a complete, production-style implementation of the patterns described on this
page, see the Automated Reasoning checks rewriting chatbot
-
An iterative rewriting loop where invalid responses are automatically corrected based on AR feedback.
-
Follow-up questions when the LLM needs additional context from the user to rewrite accurately.
-
A timeout mechanism that automatically resumes processing when users don't respond to clarification questions.
-
Policy context injection into LLM prompts so the LLM can reference the full policy rules during rewriting.
-
JSON audit logging of every validation iteration for compliance and debugging.
The sample uses a Python/Flask backend with a React frontend and communicates with
Amazon Bedrock for LLM inference and Amazon Bedrock Guardrails for validation through the
ApplyGuardrail API.
Note
The sample application includes the policy content directly in the LLM generation prompts to support any Automated Reasoning policy without requiring document uploads. In a production deployment, you would typically use RAG content or feed the LLM the original natural language document instead of the Automated Reasoning policy source code.
Call ApplyGuardrail with Automated Reasoning checks
Use the ApplyGuardrail API to validate content against your guardrail.
The API accepts one or more content blocks and returns an assessment that includes
Automated Reasoning findings.
Request structure
guardrailIdentifier(required)-
The guardrail ID or ARN. Use the guardrail that has your Automated Reasoning policy attached.
guardrailVersion(required)-
The guardrail version number (for example,
1). Use a numbered version for production workloads, notDRAFT. source(required)-
Set to
OUTPUTwhen validating LLM responses. Set toINPUTwhen validating user prompts. For Automated Reasoning checks, you typically validate the LLM output. content(required)-
An array of content blocks to validate. Each block contains a
textfield with the content to check. You can pass the user question and the LLM response as separate content blocks, or combine them into a single block.
Example: Validate an LLM response using the AWS CLI
aws bedrock-runtime apply-guardrail \ --guardrail-identifier "your-guardrail-id" \ --guardrail-version "1" \ --source OUTPUT \ --content '[ { "text": { "text": "User: Am I eligible for parental leave if I have been working here for 2 years full-time?\nAssistant: Yes, you are eligible for parental leave." } } ]'
Example: Validate an LLM response using Python (boto3)
import boto3 import json bedrock_runtime = boto3.client("bedrock-runtime", region_name="eusc-de-east-1") response = bedrock_runtime.apply_guardrail( guardrailIdentifier="your-guardrail-id", guardrailVersion="1", source="OUTPUT", content=[ { "text": { "text": ( "User: Am I eligible for parental leave if I have been " "working here for 2 years full-time?\n" "Assistant: Yes, you are eligible for parental leave." ) } } ], ) # The AR findings are in the assessments for assessment in response.get("assessments", []): ar_assessment = assessment.get("automatedReasoningPolicy", {}) findings = ar_assessment.get("findings", []) for finding in findings: # Each finding is a union — exactly one key is present # Possible keys: valid, invalid, satisfiable, impossible, # translationAmbiguous, tooComplex, noTranslations print(json.dumps(finding, indent=2, default=str))
Response structure
The ApplyGuardrail response includes an assessments
array. Each assessment contains an automatedReasoningPolicy object with
a findings array. Each finding is a union type — exactly one of the
following keys is present:
validinvalidsatisfiableimpossibletranslationAmbiguoustooComplexnoTranslations
For a detailed description of each finding type and its fields, see Findings and validation results.
Interpret AR findings at runtime
To act on Automated Reasoning findings programmatically, your application needs to extract the finding type, the translation details, and the supporting or contradicting rules. The following sections explain how to parse each part of a finding.
Determine the finding type
Each finding is a union — exactly one key is present. Check which key exists to determine the finding type:
def get_finding_type(finding): """Return the finding type and its data from an AR finding union.""" for finding_type in [ "valid", "invalid", "satisfiable", "impossible", "translationAmbiguous", "tooComplex", "noTranslations" ]: if finding_type in finding: return finding_type, finding[finding_type] return None, None
Read the translation
Most finding types include a translation object that shows how
Automated Reasoning checks translated the natural language input into formal logic.
The translation contains:
-
premises— The conditions extracted from the input (for example,isFullTime = true,tenureMonths = 24). -
claims— The assertions to validate (for example,eligibleForParentalLeave = true). -
untranslatedPremises— Parts of the input that could not be mapped to policy variables. These parts are not validated. -
untranslatedClaims— Claims that could not be mapped to policy variables.
Check untranslatedPremises and untranslatedClaims to
understand the scope of the validation. A VALID result only covers the
translated claims — untranslated content is not verified.
Read the supporting or contradicting rules
Depending on the finding type, the finding includes rules that explain the result:
-
validfindings includesupportingRules— the policy rules that prove the claims are correct. -
invalidfindings includecontradictingRules— the policy rules that the claims violate. -
satisfiablefindings include both aclaimsTrueScenarioand aclaimsFalseScenario— showing the conditions under which the claims are true and false.
These rules and scenarios are the key inputs for the rewriting pattern described in Rewrite invalid responses using AR feedback.
Determine the aggregate result
A single validation request can return multiple findings. To determine the overall
result, sort findings by severity and select the worst. The severity order from worst
to best is: TRANSLATION_AMBIGUOUS, INVALID,
IMPOSSIBLE, SATISFIABLE, NO_TRANSLATIONS,
VALID, TOO_COMPLEX.
SEVERITY_ORDER = { "translationAmbiguous": 0, "invalid": 1, "impossible": 2, "satisfiable": 3, "noTranslations": 4, "valid": 5, "tooComplex": 6, } def get_aggregate_result(findings): """Return the worst finding type from a list of findings.""" worst = None worst_severity = float("inf") for finding in findings: finding_type, _ = get_finding_type(finding) severity = SEVERITY_ORDER.get(finding_type, 0) if severity < worst_severity: worst_severity = severity worst = finding_type return worst
Handle validation outcomes in your application
Use the aggregate result to decide what your application does next. The following table summarizes the recommended action for each result type.
| Result | What it means | Recommended action |
|---|---|---|
valid |
The response is mathematically proven correct given the premises and your policy rules. | Serve the response to the user. Log the finding for audit purposes (see Build an audit trail). |
invalid |
The response contradicts your policy rules. The
contradictingRules field identifies which rules were
violated. |
Rewrite the response using the AR feedback (see Rewrite invalid responses using AR feedback). If rewriting fails after multiple attempts, block the response and return a fallback message. |
satisfiable |
The response is correct under some conditions but not all. It's not wrong, but it's incomplete — it doesn't mention all the requirements. | Rewrite the response to include the missing conditions. Use the
claimsFalseScenario to identify what's missing. Alternatively,
you can let your LLM ask the user clarifying questions. |
impossible |
The premises are contradictory, or the policy contains conflicting rules. | Ask the user to clarify their input (see Ask clarifying questions). If the issue persists, it may indicate a policy problem — review the quality report. |
translationAmbiguous |
The input has multiple valid interpretations. The translation models disagreed on how to map the natural language to policy variables. | Ask the user for clarification to resolve the ambiguity. Use the
options and differenceScenarios fields to generate
targeted clarifying questions. |
tooComplex |
The input exceeds processing limits for logical analysis. | Simplify the input by breaking it into smaller parts, or return a fallback message explaining that the response could not be verified. |
noTranslations |
The input is not relevant to your policy's domain. No policy variables could be mapped. | The content is off-topic for this policy. Serve the response without AR validation, or use other guardrail components (such as topic policies) to handle off-topic content. |
Rewrite invalid responses using AR feedback
The most powerful integration pattern for Automated Reasoning checks is the
rewriting loop: when a response is invalid or
satisfiable, your application constructs a prompt that includes the
original response, the specific findings, and the policy rules, then asks the LLM to
rewrite the response to be consistent with the policy. The rewritten response is
validated again, and the loop continues until the response is valid or a
maximum number of iterations is reached.
Rewriting loop flow
LLM generates initial response │ ▼ Validate with ApplyGuardrail ◄──────────────────┐ │ │ ▼ │ ┌─────┴─────┐ │ │ │ │ VALID Not VALID │ │ │ │ ▼ ▼ │ Done Construct rewriting prompt │ with findings + rules │ │ │ ▼ │ LLM rewrites response │ │ │ ▼ │ Max iterations? ──── No ────────────────┘ │ Yes │ ▼ Return best response with warning
Construct the rewriting prompt
The rewriting prompt should include three pieces of information from the AR findings:
-
The original response that failed validation.
-
The specific finding — including the translated premises, claims, and the contradicting or supporting rules.
-
An instruction to rewrite the response so that it is consistent with the policy rules.
Example rewriting prompt template:
The following response was checked against our policy and found to be {finding_type}. Original response: {original_response} The validation found the following issue: - Premises (what was understood from the input): {premises} - Claims (what was asserted): {claims} - Contradicting rules: {contradicting_rules} Please rewrite the response so that it is consistent with the policy document. Keep the same helpful tone and answer the user's question accurately based on the rules. If you cannot provide an accurate answer without more information, explain what additional information is needed.
Tip
Always include the Retrieval Augmented Generation (RAG) content in your
rewriting requests or the policy rules so the LLM has all the context it needs
when rewriting. The rewriting prompt template provides
the specific finding details, while the system prompt provides the broader policy
context. This dual-context approach is demonstrated in the open-source rewriting chatbot sample
Rewriting best practices
-
Set a maximum iteration count. The rewriting loop should have a hard limit (typically 2–5 iterations) to prevent infinite loops. If the response is still not
validafter the maximum iterations, return the best response with a warning or fall back to a default message. -
Process findings in priority order. When multiple findings are returned, address the most severe finding first. The severity order is:
translationAmbiguous,invalid,impossible,satisfiable,valid. -
Include policy context in the system prompt. The LLM needs access either to the source document or the full policy rules to rewrite accurately. You can use a Knowledge Base to include your documents in the generation request or use the
ExportAutomatedReasoningPolicyVersionAPI to retrieve the policy definition and format it for the LLM. -
Log each iteration. Record the original response, the findings, the rewriting prompt, and the rewritten response for each iteration. This audit trail is valuable for debugging and compliance (see Build an audit trail).
Ask clarifying questions
When Automated Reasoning checks return translationAmbiguous,
satisfiable, or impossible results, the LLM may not have
enough information to rewrite the response accurately. In these cases, your application
can ask the user for clarification, then incorporate the answers into the next
validation attempt.
When to ask for clarification
-
translationAmbiguous— The input has multiple valid interpretations. Theoptionsfield shows the competing interpretations, and thedifferenceScenariosfield shows how they differ in practice. Use these to generate targeted questions about the specific ambiguity. -
satisfiable— The response is correct under some conditions but not all. TheclaimsFalseScenarioshows the conditions under which the response would be incorrect. Ask the user about those specific conditions. -
impossible— The input contains contradictory statements. Ask the user to clarify the contradiction. -
Rewriting fails — If the LLM cannot rewrite the response to be
validafter multiple attempts, it may need additional context from the user. Ask the LLM to generate clarifying questions based on the findings.
Clarification pattern
The clarification flow works as follows:
-
Extract the ambiguous variables or missing conditions from the AR findings.
-
Generate clarifying questions — either programmatically from the finding fields, or by asking the LLM to formulate questions based on the findings.
-
Present the questions to the user and collect answers.
-
Incorporate the answers into the context and generate a new response.
-
Validate the new response with
ApplyGuardrail.
Example: Generate clarifying questions from a
satisfiable finding
def generate_clarifying_questions(finding_data, user_question): """Ask the LLM to generate clarifying questions from a SATISFIABLE finding.""" claims_true = json.dumps( finding_data.get("claimsTrueScenario", {}), indent=2, default=str ) claims_false = json.dumps( finding_data.get("claimsFalseScenario", {}), indent=2, default=str ) prompt = ( f"A user asked: {user_question}\n\n" f"The answer is correct when these conditions hold:\n{claims_true}\n\n" f"But incorrect when these conditions hold:\n{claims_false}\n\n" f"Generate 1-3 short, specific questions to ask the user to determine " f"which conditions apply to their situation. Format each question on " f"its own line." ) return generate_response(prompt, "You are a helpful assistant.")
Build an audit trail
Automated Reasoning findings provide mathematically verifiable proof of validity. For regulated industries and compliance scenarios, this proof is a key differentiator — you can demonstrate that an AI response was verified against specific policy rules with specific variable assignments, not just pattern-matched or probabilistically assessed.
To build an effective audit trail, log the following information for each validation request:
-
Timestamp and request ID. When the validation occurred and a unique identifier for the request.
-
Input content. The user question and LLM response that were validated.
-
Finding type and details. The validation result (
valid,invalid, etc.), the translated premises and claims, and the supporting or contradicting rules. -
Action taken. What your application did with the finding — served the response, rewrote it, asked for clarification, or blocked it.
-
Rewriting history. If the response was rewritten, log each iteration: the original response, the rewriting prompt, the rewritten response, and the validation result for each iteration.
-
Policy version. The guardrail version and policy version used for validation. This ensures you can reproduce the validation result later.
Example: Audit log entry structure
{ "timestamp": "2025-07-21T14:30:00Z", "request_id": "req-abc123", "guardrail_id": "your-guardrail-id", "guardrail_version": "1", "user_question": "Am I eligible for parental leave?", "llm_response": "Yes, you are eligible for parental leave.", "validation_result": "valid", "findings": [ { "type": "valid", "premises": "isFullTime = true, tenureMonths = 24", "claims": "eligibleForParentalLeave = true", "supporting_rules": ["A1B2C3D4E5F6"] } ], "action_taken": "served_response", "rewrite_iterations": 0 }
Tip
Store audit logs in a durable, tamper-evident store such as Amazon CloudWatch Logs or Amazon S3 with object lock enabled. For compliance scenarios, consider using Lake to query audit logs across your organization.