Troubleshoot and refine your Automated Reasoning policy
When an Automated Reasoning policy test fails — the actual result doesn't match the expected result — the issue is either in the translation (natural language was mapped to the wrong variables or values) or in the rules (the policy logic doesn't match your domain). This page provides a systematic approach to diagnosing and fixing both types of issues.
Before you start troubleshooting, make sure you understand the two-step validation process (translate, then validate) described in Translation: from natural language to formal logic. This distinction is the key to efficient debugging.
Note
Tutorial video: For a step-by-step walkthrough of refining and troubleshooting an Automated Reasoning policy, watch the following tutorial:
Debugging workflow
When a test fails, use the actual result to identify the type of issue and jump to the relevant section.
| Actual result | Likely cause | Where to look |
|---|---|---|
TRANSLATION_AMBIGUOUS |
The translation models disagreed on how to interpret the input. Usually caused by overlapping variables, vague descriptions, or ambiguous input text. | Fix translation issues |
NO_TRANSLATIONS |
The input couldn't be mapped to any policy variables. Either the input is off-topic or the policy is missing variables for the concepts mentioned. | Fix translation issues |
TOO_COMPLEX |
The input or policy exceeds processing limits. Often caused by non-linear arithmetic or policies with too many interacting rules. | Limitations and considerations |
IMPOSSIBLE |
The premises contradict each other, or the policy itself contains conflicting rules. | Fix impossible results |
VALID, INVALID, or SATISFIABLE
(but not what you expected) |
Check the translation in the finding first. If the right variables are assigned with the right values, the issue is in your rules. If the translation is wrong, the issue is in your variable descriptions. | Translation wrong: Fix translation issues. Rules wrong: Fix rule issues. |
Tip
Always check the translation first. In most cases, the mathematical validation (step 2) is correct — the issue is in how the natural language was translated to formal logic (step 1). Fixing variable descriptions is faster and less risky than changing rules.
Fix translation issues
Translation issues occur when Automated Reasoning checks can't reliably map natural
language to your policy's variables. The most visible symptom is a
TRANSLATION_AMBIGUOUS result, but translation issues can also cause incorrect
VALID, INVALID, or SATISFIABLE results when the
wrong variables or values are assigned.
Diagnose TRANSLATION_AMBIGUOUS results
A TRANSLATION_AMBIGUOUS finding includes two key fields that help you
understand the disagreement:
-
options— The competing logical interpretations (up to 2). Each option contains its own translation with premises, claims, and confidence. Compare the options to see where the translation models disagreed. -
differenceScenarios— Scenarios (up to 2) that illustrate how the different interpretations differ in meaning, with variable assignments highlighting the practical impact of the ambiguity.
Examine these fields to identify the specific source of ambiguity, then apply the appropriate fix from the following list.
Overlapping variable definitions
When multiple variables could reasonably represent the same concept, the translation models disagree on which one to use.
Symptom: The options in the
TRANSLATION_AMBIGUOUS finding show the same concept assigned to different
variables. For example, one option assigns "2 years of service" to
tenureMonths = 24 while the other assigns it to
monthsOfService = 24.
Fix: Merge the overlapping variables into a single variable with a comprehensive description. Update all rules that reference the deleted variable to use the remaining one.
Example:
| Before (overlapping) | After (merged) |
|---|---|
|
|
(Delete |
Incomplete variable descriptions
Variable descriptions that lack detail about how users refer to concepts in everyday language make it difficult to map input to the correct variable.
Symptom: The options show the correct
variable but with different values, or the translation assigns a value that doesn't
match what the user said. For example, "2 years" is translated to
tenureMonths = 2 instead of tenureMonths = 24.
Fix: Update the variable description to include unit conversion rules, synonyms, and alternative phrasings. See Write comprehensive variable descriptions for detailed guidance.
Example:
| Before (incomplete) | After (comprehensive) |
|---|---|
isFullTime: "Full-time status." |
isFullTime: "Whether the employee works full-time (true) or
part-time (false). Set to true when users mention being 'full-time', working
'full hours', or working 40+ hours per week. Set to false when users mention
being 'part-time', working 'reduced hours', or working fewer than 40 hours per
week." |
Inconsistent value formatting
Translation ambiguity can occur when the system is unsure how to format values such as numbers, dates, or percentages.
Symptom: The options show the same
variable but with different value formats. For example, one option translates "5%" to
interestRate = 5 while the other translates it to
interestRate = 0.05.
Fix: Update the variable description to specify the expected format and include conversion rules. See Specify units and formats in variable descriptions.
Ambiguous input text
Sometimes the input itself is genuinely ambiguous — it contains vague pronouns, unclear references, or statements that can be interpreted multiple ways.
Symptom: The options show fundamentally
different interpretations of the same text. For example, "Can they take leave?" could
refer to any employee type.
Fix: If this is a test, rewrite the input to be
more specific. At runtime, your application should ask the user for clarification when
it receives a TRANSLATION_AMBIGUOUS result. For integration patterns, see
Integrate Automated Reasoning checks in your application.
Adjust the confidence threshold
If you see TRANSLATION_AMBIGUOUS results for inputs that are borderline
ambiguous, you can adjust the confidence threshold. Lowering the threshold allows
translations with less model agreement to proceed to validation, reducing
TRANSLATION_AMBIGUOUS results but increasing the risk of incorrect
translations.
Important
Adjusting the threshold should be a last resort. In most cases, improving variable descriptions or removing overlapping variables is a better fix because it addresses the root cause. For more information on how thresholds work, see Confidence thresholds.
Fix rule issues
Rule issues occur when the translation is correct but the policy logic doesn't match your domain. You've confirmed that the right variables are assigned with the right values, but the validation result is still wrong.
Getting VALID when you expected INVALID
The policy doesn't have a rule that prohibits the claim. The response contradicts your domain knowledge, but the policy allows it.
Diagnosis: Look at the
supportingRules in the finding. These are the rules that prove the claim
is valid. Check whether these rules are correct or whether a rule is missing.
Common causes and fixes:
-
Missing rule. Your policy doesn't have a rule that covers this condition. Add a new rule that captures the constraint. For example, if the policy allows parental leave for all full-time employees but should require 12 months of tenure, add:
(=> (and isFullTime (<= tenureMonths 12)) (not eligibleForParentalLeave)) -
Rule is too permissive. An existing rule allows more than it should. Edit the rule to add the missing condition. For example, change
(=> isFullTime eligibleForParentalLeave)to(=> (and isFullTime (> tenureMonths 12)) eligibleForParentalLeave) -
Missing variable. The policy doesn't have a variable to capture a relevant concept. Add the variable, write a clear description, and create rules that reference it.
Getting INVALID when you expected VALID
The policy has a rule that incorrectly prohibits the claim.
Diagnosis: Look at the
contradictingRules in the finding. These are the rules that disprove the
claim. Check whether these rules are correct.
Common causes and fixes:
-
Rule is too restrictive. An existing rule blocks a valid scenario. Edit the rule to relax the condition or add an exception. For example, if the rule requires 24 months of tenure but the policy should require only 12, update the threshold.
-
Rule was misextracted. Automated Reasoning checks misinterpreted your source document. Edit the rule to match the intended logic, or delete it and add a correct rule manually.
Getting SATISFIABLE when you expected VALID
The response is correct under some conditions but not all. The policy has additional rules that the response doesn't address.
Diagnosis: Compare the
claimsTrueScenario and claimsFalseScenario in the finding.
The difference between them shows the conditions that the response doesn't mention.
Common causes and fixes:
-
Response is incomplete. The test output doesn't mention all the conditions required by the policy. Update the test output to include the missing conditions, or change the expected result to
SATISFIABLEif incomplete responses are acceptable for your use case. -
Policy has unnecessary rules. The policy requires conditions that aren't relevant to this scenario. Review whether the additional rules should apply and remove them if they don't.
Fix impossible results
An IMPOSSIBLE result means Automated Reasoning checks can't evaluate the
claims because the premises are contradictory or the policy itself contains conflicting
rules. There are two distinct causes.
Contradictions in the input
The test input contains statements that contradict each other. For example, "I'm a
full-time employee and also part-time" sets isFullTime = true and
isFullTime = false simultaneously, which is logically impossible.
Diagnosis: Inspect the translation
premises in the finding. Look for variables that are assigned contradictory values.
Fix: If this is a test, rewrite the input to remove
the contradiction. At runtime, your application should handle IMPOSSIBLE
results by asking the user to clarify their input.
Conflicts in the policy
The policy contains rules that contradict each other, making it impossible for Automated Reasoning checks to reach a conclusion for inputs that involve the conflicting rules.
Diagnosis: If the input is valid (no contradictory
premises), the issue is in the policy. Check the contradictingRules field
in the finding to identify which rules conflict. Also check the quality report (see
Use the quality report) — it flags
conflicting rules automatically.
Common causes and fixes:
-
Contradictory rules. Two rules reach opposite conclusions for the same conditions. For example, one rule says full-time employees are eligible for leave, while another says employees in their first year are not eligible, without specifying what happens to full-time employees in their first year. Merge the rules into a single rule with explicit conditions:
(=> (and isFullTime (> tenureMonths 12)) eligibleForLeave) -
Bare assertions. A bare assertion like
(= eligibleForLeave true)makes it impossible for any input to claim the user is not eligible. Rewrite bare assertions as implications. See Use implications (=>) to structure rules. -
Circular dependencies. Rules that depend on each other in a way that creates logical loops. Simplify the rules to break the cycle, or use intermediate variables to make the logic explicit.
Use annotations to repair your policy
Annotations are targeted corrections you apply to your policy when tests fail. Instead of manually editing rules and variables, you can use annotations to describe the change you want and let Automated Reasoning checks apply it. Annotations are available through both the console and the API.
Apply annotations in the console
-
Open the failed test and review the findings to understand the issue.
-
Modify the test conditions (for example, add a premise or change the expected result) and rerun the test. If the modified test returns the result you expect, you can apply this modification as an annotation.
-
Choose Apply annotations. Automated Reasoning checks starts a build workflow to apply the changes to your policy based on your feedback.
-
On the Review policy changes screen, review the proposed changes to your policy's rules, variables, and types. Then select Accept changes.
Apply annotations using the API
Use the StartAutomatedReasoningPolicyBuildWorkflow API with
REFINE_POLICY to apply annotations programmatically. Pass the complete
current policy definition alongside the annotations.
Annotation types include:
-
Variable annotations:
addVariable,updateVariable,deleteVariable— Add missing variables, improve descriptions, or remove duplicates. -
Rule annotations:
addRule,updateRule,deleteRule,addRuleFromNaturalLanguage— Fix incorrect rules, add missing rules, or remove conflicting rules. UseaddRuleFromNaturalLanguageto describe a rule in plain English and let Automated Reasoning checks convert it to formal logic. -
Type annotations:
addType,updateType,deleteType— Manage custom types (enums). -
Feedback annotations:
updateFromRulesFeedback,updateFromScenarioFeedback— Provide natural language feedback about specific rules or scenarios and let Automated Reasoning checks deduce the necessary changes.
Example: Add a missing variable and rule using annotations
aws bedrock start-automated-reasoning-policy-build-workflow \ --policy-arn "arn:aws:bedrock:eusc-de-east-1:111122223333:automated-reasoning-policy/lnq5hhz70wgk" \ --build-workflow-type REFINE_POLICY \ --source-content "{ \"policyDefinition\":EXISTING_POLICY_DEFINITION_JSON, \"workflowContent\": { \"policyRepairAssets\": { \"annotations\": [ { \"addVariable\": { \"name\": \"tenureMonths\", \"type\": \"int\", \"description\": \"The number of complete months the employee has been continuously employed. When users mention years of service, convert to months (for example, 2 years = 24 months).\" } }, { \"addRuleFromNaturalLanguage\": { \"naturalLanguage\": \"If an employee is full-time and has more than 12 months of tenure, then they are eligible for parental leave.\" } } ] } } }"
Annotation examples
Example 1: Fix a missing tenure requirement
Problem: The policy approves parental leave for all full-time employees, but the source document requires 12+ months of tenure.
| Before | After annotation |
|---|---|
|
Rule: No |
New variable: Updated rule: |
Example 2: Fix overlapping variables causing TRANSLATION_AMBIGUOUS
Problem: Two variables (tenureMonths and
monthsOfService) represent the same concept, causing inconsistent
translations.
Annotations:
-
deleteVariableformonthsOfService. -
updateVariablefortenureMonthswith an improved description that covers all the ways users might refer to employment duration. -
updateRulefor any rules that referencedmonthsOfService, changing them to usetenureMonths.
Example 3: Fix a bare assertion causing IMPOSSIBLE results
Problem: The rule (= eligibleForParentalLeave true) is a bare assertion
that makes it impossible for any input to claim the user is not eligible.
Annotations:
-
deleteRulefor the bare assertion. -
addRuleFromNaturalLanguage: "If an employee is full-time and has more than 12 months of tenure, then they are eligible for parental leave."
Use the quality report
The quality report is generated after each build workflow and identifies structural
issues in your policy that can cause test failures. In the console, quality report issues
are surfaced as warnings on the Definitions page. Via the API, use
GetAutomatedReasoningPolicyBuildWorkflowResultAssets with
--asset-type QUALITY_REPORT.
The quality report flags the following issues:
Conflicting rules
Two or more rules reach contradictory conclusions for the same set of conditions.
Conflicting rules cause your policy to return IMPOSSIBLE for all
validation requests that involve the conflicting rules.
Example: Rule A says
(=> isFullTime eligibleForLeave) and Rule B says
(=> (<= tenureMonths 6) (not eligibleForLeave)). For a full-time
employee with 3 months of tenure, Rule A says eligible and Rule B says not eligible —
a contradiction.
Fix: Merge the rules into a single rule with
explicit conditions:
(=> (and isFullTime (> tenureMonths 6)) eligibleForLeave). Or
delete one of the conflicting rules if it was misextracted.
Unused variables
Variables that aren't referenced by any rules. Unused variables add noise to the
translation process and can cause TRANSLATION_AMBIGUOUS results when they
compete with similar active variables for the same concept.
Fix: Delete unused variables unless you plan to add rules that reference them in a future iteration.
Unused type values
Values in a custom type (enum) that aren't referenced by any rules. For example, if
your LeaveType enum has values PARENTAL, MEDICAL, BEREAVEMENT, and
PERSONAL, but no rule references PERSONAL, it's flagged as unused.
Fix: Either add rules that reference the unused value, or remove it from the enum. Unused values can cause translation issues if the input mentions the concept but no rule handles it.
Disjoint rule sets
Groups of rules that don't share any variables. Disjoint rule sets aren't necessarily a problem — your policy may intentionally cover independent topics (for example, leave eligibility and expense reimbursement). However, they can indicate that variables are missing connections between related rules.
When to act: If the disjoint rule sets should be related (for example, they both deal with employee benefits but use different variable names for the same concept), merge the overlapping variables to connect them. If the rule sets are genuinely independent, no action is needed.
Use Kiro CLI for policy refinement
Kiro CLI provides an interactive chat interface for diagnosing and fixing policy issues. It can load your policy definition and quality report, explain why tests are failing, suggest changes, and apply annotations — all through natural language conversation.
Kiro CLI is particularly useful for:
-
Understanding failures. Ask Kiro CLI to load a failing test and explain why it's not returning the expected result. Kiro CLI will analyze the policy definition, the test findings, and the quality report to identify the root cause.
-
Resolving quality report issues. Ask Kiro CLI to summarize the quality report and suggest fixes for conflicting rules, unused variables, and overlapping variable descriptions.
-
Suggesting rule changes. Describe the behavior you expect and ask Kiro CLI to propose the necessary variable and rule changes. Review the suggestions and instruct Kiro CLI to apply them as annotations.
Example workflow:
You: The test with ID test-12345 is not returning the expected result. Can you load the test definition and findings, look at the policy definition, and explain why this test is failing? Kiro: [analyzes the test and policy] The test expects VALID but gets INVALID because rule R3 requires 24 months of tenure, while the test input specifies 18 months. The source document says 12 months. Rule R3 appears to have been misextracted. You: Can you suggest changes to fix this? Kiro: I suggest updating rule R3 to change the tenure threshold from 24 to 12 months. Here's the updated rule: ... You: Looks good. Can you use the annotation APIs to submit these changes? Kiro: [applies annotations via the API]
For complete instructions on setting up and using Kiro CLI with Automated Reasoning policies, see Use Kiro CLI with an Automated Reasoning policy.