Run CPU stress tests on EC2 instances using FIS - AWS Identity and Access Management
Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.

Run CPU stress tests on EC2 instances using FIS

The following code example shows how to:

  • Create IAM roles

  • Create a CloudWatch alarm

  • Create an experiment template

  • Run the experiment

  • Verify the results

  • Clean up resources

Bash
AWS CLI with Bash script
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials repository.

#!/bin/bash # AWS FIS CPU Stress Test Tutorial Script # This script automates the steps in the AWS FIS CPU stress test tutorial # approach using epoch time calculations that work across all Linux distributions # Set up logging LOG_FILE="fis-tutorial-$(date +%Y%m%d-%H%M%S).log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting AWS FIS CPU Stress Test Tutorial Script" echo "Logging to $LOG_FILE" echo "==============================================" # Function to check for errors in command output check_error() { local output=$1 local cmd=$2 if echo "$output" | grep -i "error" > /dev/null; then # Ignore specific expected errors if [[ "$cmd" == *"aws fis get-experiment"* ]] && [[ "$output" == *"ConfigurationFailure"* ]]; then echo "Note: Experiment failed due to configuration issue. This is expected in some cases." return 0 fi echo "ERROR: Command failed: $cmd" echo "Output: $output" cleanup_on_error exit 1 fi } # Function to clean up resources on error cleanup_on_error() { echo "Error encountered. Cleaning up resources..." if [ -n "$EXPERIMENT_ID" ]; then echo "Stopping experiment $EXPERIMENT_ID if running..." aws fis stop-experiment --id "$EXPERIMENT_ID" 2>/dev/null || true fi if [ -n "$TEMPLATE_ID" ]; then echo "Deleting experiment template $TEMPLATE_ID..." aws fis delete-experiment-template --id "$TEMPLATE_ID" || true fi if [ -n "$INSTANCE_ID" ]; then echo "Terminating EC2 instance $INSTANCE_ID..." aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" || true fi if [ -n "$ALARM_NAME" ]; then echo "Deleting CloudWatch alarm $ALARM_NAME..." aws cloudwatch delete-alarms --alarm-names "$ALARM_NAME" || true fi if [ -n "$INSTANCE_PROFILE_NAME" ]; then echo "Removing role from instance profile..." aws iam remove-role-from-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" --role-name "$EC2_ROLE_NAME" || true echo "Deleting instance profile..." aws iam delete-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" || true fi if [ -n "$FIS_ROLE_NAME" ]; then echo "Deleting FIS role policy..." aws iam delete-role-policy --role-name "$FIS_ROLE_NAME" --policy-name "$FIS_POLICY_NAME" || true echo "Deleting FIS role..." aws iam delete-role --role-name "$FIS_ROLE_NAME" || true fi if [ -n "$EC2_ROLE_NAME" ]; then echo "Detaching policy from EC2 role..." aws iam detach-role-policy --role-name "$EC2_ROLE_NAME" --policy-arn "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" || true echo "Deleting EC2 role..." aws iam delete-role --role-name "$EC2_ROLE_NAME" || true fi echo "Cleanup completed." } # Generate unique identifiers for resources TIMESTAMP=$(date +%Y%m%d%H%M%S) FIS_ROLE_NAME="FISRole-${TIMESTAMP}" FIS_POLICY_NAME="FISPolicy-${TIMESTAMP}" EC2_ROLE_NAME="EC2SSMRole-${TIMESTAMP}" INSTANCE_PROFILE_NAME="EC2SSMProfile-${TIMESTAMP}" ALARM_NAME="FIS-CPU-Alarm-${TIMESTAMP}" # Track created resources CREATED_RESOURCES=() echo "Step 1: Creating IAM role for AWS FIS" # Create trust policy file for AWS FIS cat > fis-trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "fis.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role for FIS echo "Creating IAM role $FIS_ROLE_NAME for AWS FIS..." FIS_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$FIS_ROLE_NAME" \ --assume-role-policy-document file://fis-trust-policy.json) check_error "$FIS_ROLE_OUTPUT" "aws iam create-role" CREATED_RESOURCES+=("IAM Role: $FIS_ROLE_NAME") # Create policy document for SSM actions cat > fis-ssm-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssm:SendCommand", "ssm:ListCommands", "ssm:ListCommandInvocations" ], "Resource": "*" } ] } EOF # Attach policy to the role echo "Attaching policy $FIS_POLICY_NAME to role $FIS_ROLE_NAME..." FIS_POLICY_OUTPUT=$(aws iam put-role-policy \ --role-name "$FIS_ROLE_NAME" \ --policy-name "$FIS_POLICY_NAME" \ --policy-document file://fis-ssm-policy.json) check_error "$FIS_POLICY_OUTPUT" "aws iam put-role-policy" CREATED_RESOURCES+=("IAM Policy: $FIS_POLICY_NAME attached to $FIS_ROLE_NAME") echo "Step 2: Creating IAM role for EC2 instance with SSM permissions" # Create trust policy file for EC2 cat > ec2-trust-policy.json << 'EOF' { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } EOF # Create IAM role for EC2 echo "Creating IAM role $EC2_ROLE_NAME for EC2 instance..." EC2_ROLE_OUTPUT=$(aws iam create-role \ --role-name "$EC2_ROLE_NAME" \ --assume-role-policy-document file://ec2-trust-policy.json) check_error "$EC2_ROLE_OUTPUT" "aws iam create-role" CREATED_RESOURCES+=("IAM Role: $EC2_ROLE_NAME") # Attach SSM policy to the EC2 role echo "Attaching AmazonSSMManagedInstanceCore policy to role $EC2_ROLE_NAME..." EC2_POLICY_OUTPUT=$(aws iam attach-role-policy \ --role-name "$EC2_ROLE_NAME" \ --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore) check_error "$EC2_POLICY_OUTPUT" "aws iam attach-role-policy" CREATED_RESOURCES+=("IAM Policy: AmazonSSMManagedInstanceCore attached to $EC2_ROLE_NAME") # Create instance profile echo "Creating instance profile $INSTANCE_PROFILE_NAME..." PROFILE_OUTPUT=$(aws iam create-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME") check_error "$PROFILE_OUTPUT" "aws iam create-instance-profile" CREATED_RESOURCES+=("IAM Instance Profile: $INSTANCE_PROFILE_NAME") # Add role to instance profile echo "Adding role $EC2_ROLE_NAME to instance profile $INSTANCE_PROFILE_NAME..." ADD_ROLE_OUTPUT=$(aws iam add-role-to-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" \ --role-name "$EC2_ROLE_NAME") check_error "$ADD_ROLE_OUTPUT" "aws iam add-role-to-instance-profile" # Wait for role to propagate echo "Waiting for IAM role to propagate..." sleep 10 echo "Step 3: Launching EC2 instance" # Get the latest Amazon Linux 2 AMI ID echo "Finding latest Amazon Linux 2 AMI..." AMI_ID=$(aws ec2 describe-images \ --owners amazon \ --filters "Name=name,Values=amzn2-ami-hvm-*-x86_64-gp2" "Name=state,Values=available" \ --query "sort_by(Images, &CreationDate)[-1].ImageId" \ --output text) check_error "$AMI_ID" "aws ec2 describe-images" echo "Using AMI: $AMI_ID" # Launch EC2 instance echo "Launching EC2 instance with AMI $AMI_ID..." INSTANCE_OUTPUT=$(aws ec2 run-instances \ --image-id "$AMI_ID" \ --instance-type t2.micro \ --iam-instance-profile Name="$INSTANCE_PROFILE_NAME" \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=FIS-Test-Instance}]') check_error "$INSTANCE_OUTPUT" "aws ec2 run-instances" # Get instance ID INSTANCE_ID=$(echo "$INSTANCE_OUTPUT" | grep -i "InstanceId" | head -1 | awk -F'"' '{print $4}') if [ -z "$INSTANCE_ID" ]; then echo "Failed to get instance ID" cleanup_on_error exit 1 fi echo "Launched instance: $INSTANCE_ID" CREATED_RESOURCES+=("EC2 Instance: $INSTANCE_ID") # Enable detailed monitoring echo "Enabling detailed monitoring for instance $INSTANCE_ID..." MONITOR_OUTPUT=$(aws ec2 monitor-instances --instance-ids "$INSTANCE_ID") check_error "$MONITOR_OUTPUT" "aws ec2 monitor-instances" # Wait for instance to be running and status checks to pass echo "Waiting for instance to be ready..." aws ec2 wait instance-running --instance-ids "$INSTANCE_ID" aws ec2 wait instance-status-ok --instance-ids "$INSTANCE_ID" echo "Instance is ready" echo "Step 4: Creating CloudWatch alarm for CPU utilization" # Create CloudWatch alarm echo "Creating CloudWatch alarm $ALARM_NAME..." ALARM_OUTPUT=$(aws cloudwatch put-metric-alarm \ --alarm-name "$ALARM_NAME" \ --alarm-description "Alarm when CPU exceeds 50%" \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --statistic Maximum \ --period 60 \ --threshold 50 \ --comparison-operator GreaterThanOrEqualToThreshold \ --dimensions "Name=InstanceId,Value=$INSTANCE_ID" \ --evaluation-periods 1) check_error "$ALARM_OUTPUT" "aws cloudwatch put-metric-alarm" CREATED_RESOURCES+=("CloudWatch Alarm: $ALARM_NAME") # Get the alarm ARN echo "Getting CloudWatch alarm ARN..." ALARM_ARN_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") check_error "$ALARM_ARN_OUTPUT" "aws cloudwatch describe-alarms" ALARM_ARN=$(echo "$ALARM_ARN_OUTPUT" | grep -i "AlarmArn" | head -1 | awk -F'"' '{print $4}') if [ -z "$ALARM_ARN" ]; then echo "Failed to get alarm ARN" cleanup_on_error exit 1 fi echo "Alarm ARN: $ALARM_ARN" # Wait for the alarm to initialize and reach OK state echo "Waiting for CloudWatch alarm to initialize (60 seconds)..." sleep 60 # Check alarm state echo "Checking alarm state..." ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") ALARM_STATE=$(echo "$ALARM_STATE_OUTPUT" | grep -i "StateValue" | head -1 | awk -F'"' '{print $4}') echo "Current alarm state: $ALARM_STATE" # If alarm is not in OK state, wait longer or generate some baseline metrics if [ "$ALARM_STATE" != "OK" ]; then echo "Alarm not in OK state. Waiting for alarm to stabilize (additional 60 seconds)..." sleep 60 # Check alarm state again ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms \ --alarm-names "$ALARM_NAME") ALARM_STATE=$(echo "$ALARM_STATE_OUTPUT" | grep -i "StateValue" | head -1 | awk -F'"' '{print $4}') echo "Updated alarm state: $ALARM_STATE" if [ "$ALARM_STATE" != "OK" ]; then echo "Warning: Alarm still not in OK state. Experiment may fail to start." fi fi echo "Step 5: Creating AWS FIS experiment template" # Get the IAM role ARN echo "Getting IAM role ARN for $FIS_ROLE_NAME..." ROLE_ARN_OUTPUT=$(aws iam get-role \ --role-name "$FIS_ROLE_NAME") check_error "$ROLE_ARN_OUTPUT" "aws iam get-role" ROLE_ARN=$(echo "$ROLE_ARN_OUTPUT" | grep -i "Arn" | head -1 | awk -F'"' '{print $4}') if [ -z "$ROLE_ARN" ]; then echo "Failed to get role ARN" cleanup_on_error exit 1 fi echo "Role ARN: $ROLE_ARN" # Get account ID and region ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) REGION=$(aws configure get region) if [ -z "$REGION" ]; then REGION="us-east-1" # Default to us-east-1 if region not set fi INSTANCE_ARN="arn:aws:ec2:${REGION}:${ACCOUNT_ID}:instance/${INSTANCE_ID}" echo "Instance ARN: $INSTANCE_ARN" # Create experiment template - Fixed JSON escaping issue cat > experiment-template.json << EOF { "description": "Test CPU stress predefined SSM document", "targets": { "testInstance": { "resourceType": "aws:ec2:instance", "resourceArns": ["$INSTANCE_ARN"], "selectionMode": "ALL" } }, "actions": { "runCpuStress": { "actionId": "aws:ssm:send-command", "parameters": { "documentArn": "arn:aws:ssm:$REGION::document/AWSFIS-Run-CPU-Stress", "documentParameters": "{\"DurationSeconds\":\"120\"}", "duration": "PT5M" }, "targets": { "Instances": "testInstance" } } }, "stopConditions": [ { "source": "aws:cloudwatch:alarm", "value": "$ALARM_ARN" } ], "roleArn": "$ROLE_ARN", "tags": { "Name": "FIS-CPU-Stress-Experiment" } } EOF # Create experiment template echo "Creating AWS FIS experiment template..." TEMPLATE_OUTPUT=$(aws fis create-experiment-template --cli-input-json file://experiment-template.json) check_error "$TEMPLATE_OUTPUT" "aws fis create-experiment-template" TEMPLATE_ID=$(echo "$TEMPLATE_OUTPUT" | grep -i "id" | head -1 | awk -F'"' '{print $4}') if [ -z "$TEMPLATE_ID" ]; then echo "Failed to get template ID" cleanup_on_error exit 1 fi echo "Experiment template created with ID: $TEMPLATE_ID" CREATED_RESOURCES+=("FIS Experiment Template: $TEMPLATE_ID") echo "Step 6: Starting the experiment" # Start the experiment echo "Starting AWS FIS experiment using template $TEMPLATE_ID..." EXPERIMENT_OUTPUT=$(aws fis start-experiment \ --experiment-template-id "$TEMPLATE_ID" \ --tags '{"Name": "FIS-CPU-Stress-Run"}') check_error "$EXPERIMENT_OUTPUT" "aws fis start-experiment" EXPERIMENT_ID=$(echo "$EXPERIMENT_OUTPUT" | grep -i "id" | head -1 | awk -F'"' '{print $4}') if [ -z "$EXPERIMENT_ID" ]; then echo "Failed to get experiment ID" cleanup_on_error exit 1 fi echo "Experiment started with ID: $EXPERIMENT_ID" CREATED_RESOURCES+=("FIS Experiment: $EXPERIMENT_ID") echo "Step 7: Tracking experiment progress" # Track experiment progress echo "Tracking experiment progress..." MAX_CHECKS=30 CHECK_COUNT=0 EXPERIMENT_STATE="" while [ $CHECK_COUNT -lt $MAX_CHECKS ]; do EXPERIMENT_INFO=$(aws fis get-experiment --id "$EXPERIMENT_ID") # Don't check for errors here, as we expect some experiments to fail EXPERIMENT_STATE=$(echo "$EXPERIMENT_INFO" | grep -i "status" | head -1 | awk -F'"' '{print $4}') echo "Experiment state: $EXPERIMENT_STATE" if [ "$EXPERIMENT_STATE" == "completed" ] || [ "$EXPERIMENT_STATE" == "stopped" ] || [ "$EXPERIMENT_STATE" == "failed" ]; then # Show the reason for the state REASON=$(echo "$EXPERIMENT_INFO" | grep -i "reason" | head -1 | awk -F'"' '{print $4}') if [ -n "$REASON" ]; then echo "Reason: $REASON" fi break fi echo "Waiting 10 seconds before checking again..." sleep 10 CHECK_COUNT=$((CHECK_COUNT + 1)) done if [ $CHECK_COUNT -eq $MAX_CHECKS ]; then echo "Experiment is taking longer than expected. You can check its status later using:" echo "aws fis get-experiment --id $EXPERIMENT_ID" fi echo "Step 8: Verifying experiment results" # Check CloudWatch alarm state echo "Checking CloudWatch alarm state..." ALARM_STATE_OUTPUT=$(aws cloudwatch describe-alarms --alarm-names "$ALARM_NAME") check_error "$ALARM_STATE_OUTPUT" "aws cloudwatch describe-alarms" echo "$ALARM_STATE_OUTPUT" # Get CPU utilization metrics echo "Getting CPU utilization metrics..." END_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ") # FIXED: Cross-platform compatible way to calculate time 10 minutes ago # This approach uses epoch seconds and basic arithmetic which works on all Linux distributions CURRENT_EPOCH=$(date +%s) TEN_MINUTES_AGO_EPOCH=$((CURRENT_EPOCH - 600)) START_TIME=$(date -u -d "@$TEN_MINUTES_AGO_EPOCH" +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u -r "$TEN_MINUTES_AGO_EPOCH" +"%Y-%m-%dT%H:%M:%SZ") # Create metric query file cat > metric-query.json << EOF [ { "Id": "cpu", "MetricStat": { "Metric": { "Namespace": "AWS/EC2", "MetricName": "CPUUtilization", "Dimensions": [ { "Name": "InstanceId", "Value": "$INSTANCE_ID" } ] }, "Period": 60, "Stat": "Maximum" } } ] EOF METRICS_OUTPUT=$(aws cloudwatch get-metric-data \ --start-time "$START_TIME" \ --end-time "$END_TIME" \ --metric-data-queries file://metric-query.json) check_error "$METRICS_OUTPUT" "aws cloudwatch get-metric-data" echo "CPU Utilization Metrics:" echo "$METRICS_OUTPUT" # Display summary of created resources echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" for resource in "${CREATED_RESOURCES[@]}"; do echo "- $resource" done echo "===========================================" # Prompt for cleanup echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Do you want to clean up all created resources? (y/n): " read -r CLEANUP_CHOICE if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Starting cleanup process..." # Stop experiment if still running if [ "$EXPERIMENT_STATE" != "completed" ] && [ "$EXPERIMENT_STATE" != "stopped" ] && [ "$EXPERIMENT_STATE" != "failed" ]; then echo "Stopping experiment $EXPERIMENT_ID..." STOP_OUTPUT=$(aws fis stop-experiment --id "$EXPERIMENT_ID") check_error "$STOP_OUTPUT" "aws fis stop-experiment" echo "Waiting for experiment to stop..." sleep 10 fi # Delete experiment template echo "Deleting experiment template $TEMPLATE_ID..." DELETE_TEMPLATE_OUTPUT=$(aws fis delete-experiment-template --id "$TEMPLATE_ID") check_error "$DELETE_TEMPLATE_OUTPUT" "aws fis delete-experiment-template" # Delete CloudWatch alarm echo "Deleting CloudWatch alarm $ALARM_NAME..." DELETE_ALARM_OUTPUT=$(aws cloudwatch delete-alarms --alarm-names "$ALARM_NAME") check_error "$DELETE_ALARM_OUTPUT" "aws cloudwatch delete-alarms" # Terminate EC2 instance echo "Terminating EC2 instance $INSTANCE_ID..." TERMINATE_OUTPUT=$(aws ec2 terminate-instances --instance-ids "$INSTANCE_ID") check_error "$TERMINATE_OUTPUT" "aws ec2 terminate-instances" echo "Waiting for instance to terminate..." aws ec2 wait instance-terminated --instance-ids "$INSTANCE_ID" # Clean up IAM resources echo "Removing role from instance profile..." REMOVE_ROLE_OUTPUT=$(aws iam remove-role-from-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME" \ --role-name "$EC2_ROLE_NAME") check_error "$REMOVE_ROLE_OUTPUT" "aws iam remove-role-from-instance-profile" echo "Deleting instance profile..." DELETE_PROFILE_OUTPUT=$(aws iam delete-instance-profile \ --instance-profile-name "$INSTANCE_PROFILE_NAME") check_error "$DELETE_PROFILE_OUTPUT" "aws iam delete-instance-profile" echo "Deleting FIS role policy..." DELETE_POLICY_OUTPUT=$(aws iam delete-role-policy \ --role-name "$FIS_ROLE_NAME" \ --policy-name "$FIS_POLICY_NAME") check_error "$DELETE_POLICY_OUTPUT" "aws iam delete-role-policy" echo "Detaching policy from EC2 role..." DETACH_POLICY_OUTPUT=$(aws iam detach-role-policy \ --role-name "$EC2_ROLE_NAME" \ --policy-arn "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore") check_error "$DETACH_POLICY_OUTPUT" "aws iam detach-role-policy" echo "Deleting FIS role..." DELETE_FIS_ROLE_OUTPUT=$(aws iam delete-role \ --role-name "$FIS_ROLE_NAME") check_error "$DELETE_FIS_ROLE_OUTPUT" "aws iam delete-role" echo "Deleting EC2 role..." DELETE_EC2_ROLE_OUTPUT=$(aws iam delete-role \ --role-name "$EC2_ROLE_NAME") check_error "$DELETE_EC2_ROLE_OUTPUT" "aws iam delete-role" # Clean up temporary files echo "Cleaning up temporary files..." rm -f fis-trust-policy.json ec2-trust-policy.json fis-ssm-policy.json experiment-template.json metric-query.json echo "Cleanup completed successfully." else echo "Cleanup skipped. Resources will remain in your AWS account." echo "You can manually clean up the resources listed above." fi echo "" echo "Script execution completed." echo "Log file: $LOG_FILE"

For a complete list of AWS SDK developer guides and code examples, see Using this service with an AWS SDK. This topic also includes information about getting started and details about previous SDK versions.