Automatic remediation with jobStateTimeLimitActions - AWS Batch
Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.

Automatic remediation with jobStateTimeLimitActions

Optionally, you can configure the jobStateTimeLimitActions parameter through CreateJobQueue and UpdateJobQueue API actions.

Note

Currently, for job queues connected to Amazon ECS, Amazon EKS, or Fargate compute environments, the only action you can use with jobStateLimitActions.action is to cancel a job.

The jobStateTimeLimitActions parameter is used to specify a set of actions that AWS Batch performs on jobs in a specific state. You can set a time threshold in seconds through the maxTimeSeconds field.

When a job has been in a RUNNABLE state with the defined statusReason, AWS Batch performs the action specified after maxTimeSeconds have elapsed.

For example, you can set the jobStateTimeLimitActions parameter to wait up to 4 hours for any job in the RUNNABLE state that is waiting for sufficient capacity to become available. You can do this by setting statusReason to CAPACITY:INSUFFICIENT_INSTANCE_CAPACITY and maxTimeSeconds to 14400 before cancelling the job and allowing the next job to advance to the head of the job queue.

The statusReason values returned by the ListJobs and DescribeJobs API actions are the same values you can define for the jobStateTimeLimitActions.statusReason parameter. However, not all statusReason values support automatic remediation.

The following statusReason values support jobStateTimeLimitActions:

  • CAPACITY:INSUFFICIENT_INSTANCE_CAPACITY

  • MISCONFIGURATION:COMPUTE_ENVIRONMENT_MAX_RESOURCE

  • MISCONFIGURATION:JOB_RESOURCE_REQUIREMENT

  • MISCONFIGURATION:EC2_INSTANCE_CONFIGURATION_UNSUPPORTED

The following statusReason values do not support jobStateTimeLimitActions and require manual investigation:

  • MISCONFIGURATION:SERVICE_ROLE_PERMISSIONS

  • ACTION_REQUIRED

  • UNDETERMINED