Enhanced Container Insights (Classic) metric reference - Amazon CloudWatch
Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.

Enhanced Container Insights (Classic) metric reference

Enhanced Container Insights (Classic) publishes metrics to the ContainerInsights namespace in CloudWatch. The following tables list all metrics by category, along with their dimensions and descriptions.

Note

For OTel Container Insights metrics, see OTel Container Insights (Recommended), which lists open-source metric names.

Cluster metrics

The following metrics measure the overall health and capacity of your Amazon EKS cluster.

Metric Name Dimensions Description
cluster_failed_node_count ClusterName The number of failed worker nodes in the cluster.
cluster_node_count ClusterName The total number of worker nodes in the cluster.

Node metrics

The following metrics measure resource usage and capacity at the node level.

Metric Name Dimensions Description
node_cpu_limit ClusterName, InstanceId, NodeName The maximum number of CPU units that can be assigned to the node.
node_cpu_reserved_capacity ClusterName, InstanceId, NodeName The percentage of CPU units reserved on the node.
node_cpu_usage_total ClusterName, InstanceId, NodeName The number of CPU units used on the node.
node_cpu_utilization ClusterName, InstanceId, NodeName The percentage of CPU units used on the node.
node_filesystem_utilization ClusterName, InstanceId, NodeName The percentage of filesystem capacity used on the node.
node_memory_limit ClusterName, InstanceId, NodeName The maximum amount of memory, in bytes, that can be assigned to the node.
node_memory_reserved_capacity ClusterName, InstanceId, NodeName The percentage of memory reserved on the node.
node_memory_utilization ClusterName, InstanceId, NodeName The percentage of memory used on the node.
node_memory_working_set ClusterName, InstanceId, NodeName The amount of memory, in bytes, in the working set of the node.
node_network_total_bytes ClusterName, InstanceId, NodeName The total number of bytes per second transmitted and received over the network on the node.
node_number_of_running_containers ClusterName, InstanceId, NodeName The number of running containers on the node.
node_number_of_running_pods ClusterName, InstanceId, NodeName The number of running pods on the node.

Pod metrics

The following metrics measure resource usage at the pod level.

Metric Name Dimensions Description
pod_cpu_utilization ClusterName, Namespace, PodName The percentage of CPU units used by the pod.
pod_cpu_utilization_over_pod_limit ClusterName, Namespace, PodName The percentage of CPU units used relative to the pod limit.
pod_cpu_usage_total ClusterName, Namespace, PodName The number of CPU units used by the pod.
pod_memory_utilization ClusterName, Namespace, PodName The percentage of memory used by the pod.
pod_memory_utilization_over_pod_limit ClusterName, Namespace, PodName The percentage of memory used relative to the pod limit.
pod_memory_working_set ClusterName, Namespace, PodName The amount of memory, in bytes, in the working set of the pod.
pod_network_rx_bytes ClusterName, Namespace, PodName The number of bytes received per second over the network by the pod.
pod_network_tx_bytes ClusterName, Namespace, PodName The number of bytes transmitted per second over the network by the pod.
pod_number_of_container_restarts ClusterName, Namespace, PodName The total number of container restarts in the pod.

Container metrics

The following metrics measure resource usage at the container level.

Metric Name Dimensions Description
container_cpu_utilization ClusterName, Namespace, PodName, ContainerName The percentage of CPU units used by the container.
container_memory_utilization ClusterName, Namespace, PodName, ContainerName The percentage of memory used by the container.

Service and namespace metrics

The following metrics measure pod counts at the service and namespace level.

Metric Name Dimensions Description
service_number_of_running_pods ClusterName, Namespace, Service The number of running pods for the service.
namespace_number_of_running_pods ClusterName, Namespace The number of running pods in the namespace.

NVIDIA GPU metrics

The following metrics are collected when your cluster has NVIDIA GPU-equipped nodes. These metrics help you monitor GPU compute and memory utilization across nodes, pods, and containers.

Metric Name Dimensions Description
node_gpu_utilization ClusterName, InstanceId, NodeName The percentage of GPU compute capacity used on the node.
node_gpu_memory_utilization ClusterName, InstanceId, NodeName The percentage of GPU memory used on the node.
node_gpu_temperature ClusterName, InstanceId, NodeName The GPU temperature in degrees Celsius on the node.
pod_gpu_utilization ClusterName, Namespace, PodName The percentage of GPU compute capacity used by the pod.
container_gpu_utilization ClusterName, Namespace, PodName, ContainerName The percentage of GPU compute capacity used by the container.

AWS Neuron metrics

The following metrics are collected when your cluster has AWS Neuron-equipped instances, such as Inf1 or Inf2. These metrics help you monitor NeuronCore utilization at the node and pod level.

Metric Name Dimensions Description
node_neuroncore_utilization ClusterName, InstanceId, NodeName The percentage of NeuronCore compute capacity used on the node.
pod_neuroncore_utilization ClusterName, Namespace, PodName The percentage of NeuronCore compute capacity used by the pod.

Kubernetes API server metrics

The following metrics are collected from the Kubernetes API server. These metrics help you monitor the health and performance of the control plane.

Metric Name Dimensions Description
apiserver_storage_objects ClusterName The number of objects stored in etcd by the API server.
apiserver_request_total ClusterName The total number of requests received by the API server.
apiserver_request_duration_seconds ClusterName The latency of API server requests, in seconds.