Fault tolerance
You can use the following checks for the fault tolerance category.
Check names
Amazon Aurora DB Instance Accessibility
- Description
-
Checks for cases where an Amazon Aurora DB cluster has both private and public instances.
When your primary instance fails, a replica can be promoted to a primary instance. If that replica is private, users who have only public access would no longer be able to connect to the database after failover. We recommend that all the DB instances in a cluster have the same accessibility.
- Check ID
-
xuy7H1avtl - Alert Criteria
-
Yellow: The instances in an Aurora DB cluster have different accessibility (a mix of public and private).
- Recommended Action
-
Modify the
Publicly Accessiblesetting of the instances in the DB cluster so that they are all either public or private. For details, see the instructions for MySQL instances at Modifying a DB Instance Running the MySQL Database Engine. - Additional Resources
- Report columns
-
-
Status
-
Region
-
Cluster
-
Public DB Instances
-
Private DB Instances
-
Reason
-
Amazon EBS Snapshots
- Description
-
Checks the age of the snapshots for your Amazon EBS volumes (either available or in-use). Failures can occur even if Amazon EBS volumes are replicated. Snapshots are persisted to Amazon S3 for durable storage and point-in-time recovery.
- Check ID
-
H7IgTzjTYb - Alert Criteria
-
-
Yellow: The most recent volume snapshot is between 7 and 30 days old.
-
Red: The most recent volume snapshot is more than 30 days old.
-
Red: The volume does not have a snapshot.
-
- Recommended Action
-
Create weekly or monthly snapshots of your volumes. For more information, see Creating an Amazon EBS Snapshot.
To automate the creation of EBS snapshots, you can consider using AWS Backup or Amazon Data Lifecycle Manager.
- Additional Resources
- Report columns
-
-
Status
-
Region
-
Volume ID
-
Volume Name
-
Snapshot ID
-
Snapshot Name
-
Snapshot Age
-
Volume Attachment
-
Reason
-
Amazon EC2 Availability Zone Balance
- Description
-
Checks the distribution of Amazon Elastic Compute Cloud (Amazon EC2) instances across Availability Zones in a Region.
Availability Zones are distinct locations that are insulated from failures in other Availability Zones. This allows inexpensive, low-latency network connectivity between Availability Zones in the same Region. By launching instances in multiple Availability Zones in the same Region, you can help protect your applications from a single point of failure.
- Check ID
-
wuy7G1zxql - Alert Criteria
-
-
Yellow: The Region has instances in multiple zones, but the distribution is uneven (the difference between the highest and lowest instance counts in utilized Availability Zones is greater than 20%).
-
Red: The Region has instances only in a single Availability Zone.
-
- Recommended Action
-
Balance your Amazon EC2 instances evenly across multiple Availability Zones. You can do this by launching instances manually or by using Auto Scaling to do it automatically. For more information, see Launch Your Instance and Load Balance Your Auto Scaling Group.
- Additional Resources
- Report columns
-
-
Status
-
Region
-
Zone a Instances
-
Zone b Instances
-
Zone c Instances
-
Zone e Instances
-
Zone f Instances
-
Reason
-
Amazon RDS Backups
- Description
-
Checks for automated backups of Amazon RDS DB instances.
By default, backups are enabled with a retention period of one day. Backups reduce the risk of unexpected data loss and allow for point-in-time recovery.
Note
This check reports the resources that are flagged by the criteria and the total number of resources evaluated, including
OKresources. The resources table lists only the flagged resources. - Check ID
-
opQPADkZvH - Alert Criteria
-
Red: A DB instance has the backup retention period set to 0 days.
- Recommended Action
-
Set the retention period for the automated DB instance backup to 1 to 35 days as appropriate to the requirements of your application. See Working With Automated Backups.
- Additional Resources
- Report columns
-
-
Status
-
Region/AZ
-
DB Instance
-
VPC ID
-
Backup Retention Period
-
Amazon RDS Multi-AZ
- Description
-
Checks for DB instances that are deployed in a single Availability Zone (AZ).
Multi-AZ deployments enhance database availability by synchronously replicating to a standby instance in a different Availability Zone. During planned database maintenance, or the failure of a DB instance or Availability Zone, Amazon RDS automatically fails over to the standby. This failover allows database operations to resume quickly without administrative intervention. Because Amazon RDS does not support Multi-AZ deployment for Microsoft SQL Server, this check does not examine SQL Server instances.
Note
This check reports the resources that are flagged by the criteria and the total number of resources evaluated, including
OKresources. The resources table lists only the flagged resources. - Check ID
-
f2iK5R6Dep - Alert Criteria
-
Yellow: A DB instance is deployed in a single Availability Zone.
- Recommended Action
-
If your application requires high availability, modify your DB instance to enable Multi-AZ deployment. See High Availability (Multi-AZ).
- Additional Resources
- Report columns
-
-
Status
-
Region/AZ
-
DB Instance
-
VPC ID
-
Multi-AZ
-
VPN Tunnel Redundancy
- Description
-
Checks the number of tunnels that are active for each of your Site-to-Site VPNs.
A VPN should have two tunnels configured at all times. This provides redundancy in case of outage or planned maintenance of the devices at the AWS endpoint. For some hardware, only one tunnel is active at a time. If a VPN has no active tunnels, charges for the VPN might still apply. For more information, see AWS Site-to-Site VPN User Guide.
- Check ID
-
S45wrEXrLz - Alert Criteria
-
-
Yellow: A VPN has one active tunnel (this is normal for some hardware).
-
Yellow: A VPN has no active tunnels.
-
- Recommended Action
-
Be sure that two tunnels are configured for your VPN connection, and that both are active if your hardware supports it. If you no longer need a VPN connection, you can delete it to avoid charges. For more information, see Your customer gateway device or Delete a Site-to-Site VPN connection.
- Additional Resources
- Report columns
-
-
Status
-
Region
-
VPN ID
-
VPC
-
Virtual Private Gateway
-
Customer Gateway
-
Active Tunnels
-
Reason
-