[07] Monitoring Flashcards
What CloudWatch metrics does ECS generate for clusters?
CPUReservation, CPUUtilization, MemoryReservation, MemoryUtilization, GPUReservation
What is the CPUReservation metric for clusters?
The percentage of CPU units that are reserved by running EC2 tasks in the cluster
What is the CPUUtilization metric for clusters?
The percentage of CPU units currently used, divided by the total amount reserved for a cluster / service (only includes EC2 tasks)
What is the MemoryReservation metric for clusters?
The percentage of memory that is reserved by running EC2 tasks in the cluster
What is the MemoryUtilization metric for clusters?
The total memory used by tasks in a service / cluster relative to the amount reserved (only includes EC2 tasks)
What is the GPUReservation metric for clusters?
The percentage of total available GPUs that are reserved by running tasks in the cluster
What CloudWatch metrics does ECS generate for services?
CPUUtilization, MemoryUtilization
What is the CPUUtilization metric for services?
The number of CPU units in use divided by the total amount reserved for a cluster / service
What is the MemoryUtilization metric for services?
The total memory used by tasks in a service / cluster relative to the amount reserved
What metrics are generated when Service Connect is enabled?
Metrics similar to an Application Load Balancer (ALB)
How frequently are CloudWatch metrics published by ECS?
With a 1-minute frequency (ECS internally collects multiple samples and aggregates them)
How can you collect metrics via Prometheus?
By adding an AWS Distro for OpenTelemetry (ADOT) sidecar
What is a way to add an ADOT sidecar to a task definition?
The console has an option to automatically add this sidecar to the task definition
What state is a container instance in when the ECS agent health check passes?
OK
What state is a container instance in when the ECS agent health check fails?
IMPAIRED
How frequently does the ECS agent perform health checks on the underlying EC2 instance?
Every two minutes
What is the IMPAIRED state?
When the ECS agent health check fails
What is the OK state?
When the ECS agent health check passes
What can tasks use to access metadata about themselves?
The container metadata file and the task metadata endpoint
What does the container metadata file provide information about?
That task
How is the container metadata file enabled for EC2 tasks?
By the ECS agent
How is the container metadata file mounted?
As a Docker volume
What does the task metadata endpoint return information about?
The current container or its task
What allows services on the EC2 instance to access information about the container instance and its agent?
Container introspection
What types of events does ECS emit?
Container instance state changes, task state changes, and service action events
What are some examples of container instance state changes that trigger ECS events?
Tasks being stopped or started on the instance, or the agent disconnecting
What are some examples of INFO events related to service actions?
SERVICE_STEADY_STATE, TASKSET_STEADY_STATE, CAPACITY_PROVIDER_STEADY_STATE, SERVICE_DESIRED_COUNT_UPDATED
What are some examples of WARN events related to service actions?
SERVICE_TASK_START_IMPAIRED, SERVICE_DISCOVERY_INSTANCE_UNHEALTHY
What are some examples of ERROR events related to service actions?
SERVICE_DAEMON_PLACEMENT_CONSTRAINT_VIOLATED, ECS_OPERATION_THROTTLED, SERVICE_DISCOVERY_OPERATION_THROTTLED, SERVICE_TASK_PLACEMENT_FAILURE, SERVICE_TASK_CONFIGURATION_FAILURE
What service deployment state changes trigger ECS events?
SERVICE_DEPLOYMENT_IN_PROGRESS, SERVICE_DEPLOYMENT_COMPLETED, SERVICE_DEPLOYMENT_FAILED
What state will a service deployment be in while ECS is performing additional steps?
SERVICE_DEPLOYMENT_IN_PROGRESS
What state indicates a successful completion of a service deployment?
SERVICE_DEPLOYMENT_COMPLETED
What state indicates a failed service deployment?
SERVICE_DEPLOYMENT_FAILED